I've been working with relational data bases now for over fifteen years. In all that time, I have had to modify (to be read: Fixed Up) hundreds of systems. Nine times out of ten times, the problem with the system could be traced back to a bad data model definition. Every part of a data system is sitting square on the back of the data model. This means that a bad data model will effect every aspect of your system.
Therefore, I can say with confidence, that the most important part of any system is the underlining data structure. The problem is that most people developing a system (even professionals) spend the least amount of time analyzing the data structure. Remember that no matter how much frosting you put on a bad cake, it's still going to taste bad.
This brings us to our topic for the first ever "Cliff's Corner" column, "DATA NORMALIZATION". This concept was first spelled out by a gentlemen named Cobb at IBM back in the early 70's. The concept was tweaked, adjusted, modified and altered until it took a 300 page book to describe it. However, it really isn't that difficult a concept in real life. In this article, I will be eliminating the buzz words, jargon, and techno babble and just stick to what you need for a practical application. The examples below are using the Paradox Data Base Engine, but the principals described here will apply to most relational data bases.
The key to all normalization is a "One to One Relationship" between every piece of data in the table and the primary key for that table. A primary key is a field or set of fields at the top of the table that UNIQUELY identifies each record. So, the key for a "Customer" data table could be "Customer Number", because each customer will be assigned a unique number. Choosing a good key could be the single most important part of your data planning. What makes a good key? More importantly, what doesn't? Is a phone number a good key? No, phone numbers change. Once a record is set, you really don't want to change the key if possible. How about last names? This is not a good key for two reasons. First, names change (marriage, divorce, adoption, etc). Secondly, how many Smith's do you know. A last name is not very unique. Even if you add the first name to the primary key, there will quite likely be more than one Bob Smith. Does social security number make a good key? Yes, it is unique and does not change over time. Often the best key is something that we create ourselves, like a customer number. This makes a good key because we are in control of it.
Now that we have a good key for our table, let's revisit the concept of one to one relationships. The simplest way to explain this is with an example. Let's look at an invoice data structure, a data model most of us will be familiar with. We start with a main table called "INVOICES", that has a unique key, "Invoice Number". Now we begin to add fields to the table. As we add each field we need to ask ourselves three important questions.
Does this piece of data relate to the key?
Is there only one piece of data like this one for this key?
Will this information NOT be repeated in other records?
If the answer to any of these questions is no, this piece of data DOES NOT belong in the Invoices table. Let look at some sample additional fields to our table.
Invoice Date
Does this piece of date relate to the key? Yes, each invoice has a date that is was create.
Is there only one piece of data like this one for this key? Yes, there is only one date for an invoice number.
Will this information NOT be repeated in other records? Yes, this data is unique to this record.
Invoice Date got three "Yes" answers so it is unique to the Invoices table. Add the data field to the invoice table.
Customer Name
Does this piece of date relate to the key? Yes, each invoice has a customer name attached to it.
Is there only one piece of data like this one for this key? Yes, there is only one customer per invoice number.
Will this information NOT be repeated in other records? NO, this customer name will be used on other invoices.
The answer to question 3 is "NO". The customer name will be repeated on many invoice records. We do not want to retype the customer name on every invoice again and again. Plus, the name could be typed differently on each record (eg. Ford Motor, Ford Motor Co.) possibly causing reporting problems later in your application. This information needs to be in a different table.
Item Description
Does this piece of data relate to the key? Yes, each invoice item has a description.
Is there only one piece of data like this one for this key? NO, there is more than one item per invoice number.
Will this information NOT be repeated in other records? Yes, this data is unique to this record.
The answer to question 2 is "NO". There is more than one item for each invoice. A dead give away for this problem is, if you start putting numbers on the back of your field names (eg. Item Description 1, Item Description 2, ...). Once again, this information needs to be placed in another table.
Employee Withholding
No, invoices do not need this information. Enough said. Do not put this data in the Invoices table.
Try this exercise. Below is a listing of the information we will need for our invoice. For each field, decide if the information belongs in the "INVOICES" table and if not, what question or questions received a "NO" answer. No peeking at the answers below.
Invoice # (Primary Key) _______
Invoice Date _______
Customer Number _______
Customer Name _______
Customer Address _______
Customer Phone Number _______
Purchase Order Number _______
Purchase Order Date _______
Purchase Order Amount _______
Ship To Address _______
Item Number _______
Item Name _______
Item Description _______
Item Quantity _______
Item Price _______
Sales Tax _______
The correct fields to be placed in the INVOICES table are Invoice #, Invoice Date, Customer Number, and Purchase Order Number. That's it, only four fields. I hope you got this right. If you only got two fields and you missed either Customer Number or Purchase Order Number, don't feel bad. We haven't really talked about linking fields yet. A linking field is a data item that is used to point to a unique record in another table. When we print out our invoice we will need information like the Customer's Name. Something in our invoice table needs to tell us what customer is buying the products. Therefore, we need to create a new table called "CUSTOMER", that will contain this information. The Customer table will have a primary key called "Customer Number". By placing a linking field into your invoice table, when we print out the invoice we will have access to all the other fields in the Customer table. So our customer table structure looks like this:
Customer Number (Primary Key)
Customer Name
Customer Address
Customer Phone Number
Note that the primary key (all the fields) needs to be in the invoice table to identify the correct customer. Once we have entered the Customer Number into our Invoice table, the customer's name, address, and phone number becomes available to any forms or reports using our data model.
Thanks for reading "Cliff's Corner". Next month I will continue our discussion about how to set up a data model, including what happened to all those product items in our invoice. We will see how to set up a Paradox data model and incorporate them into a form. Hope to see you then.