Database Land

December 21, 2007

The Basics (III) - Foreign Keys & Referential Integrity

Filed under: attributes, basics, foreign key, referential integrity — Amr El-Helw @ 3:01 pm

As we already discussed, a database table stores the information that is related to a specific entity type. Different entities are stored in different tables, with each table having its own structure, and own attributes related to that entity. In most situations, the entities in these tables are not isolated, but are linked with some kind of relationship. For example, in a corporate environment, we might have a table for employees and a table for departments. However, departments contain employees, or employees work in departments. That is the link between the two entity types.

In databases, this “link” is represented by what we call a foreign key relationship. For example, consider the two tables shown. We have a customers table and an Orders table. Each table has its own primary key (which is called ID in both tables). So what is the link between those tables? Well, each order is made by one customer. A customer can make any number of orders. Notice the attribute Cust_ID in the Orders table. This attribute indicates the customer that made each particular order. So for example, order no. 2 is made by customer whose ID is “024″ (i.e., Bauer). The values in the Cust_ID column refer to the values in the ID column of the Customer table. The Cust_ID column is called a Foreign Key, since it refers to the key (primary key) of another (foreign) table.

So why do we need to do that? Why not just put the customer names in the Order table? The actual reason is called Normalization, which as a whole different topic that we might discuss later on. But for now, think of it as follows: Let’s say the company found a spelling mistake in Mr. Jack Bauer’s name (customer no. 024). If the customer name was placed with each single order, then the company would have to go and correct that spelling mistake in every order made my that customer. However, using foreign keys, the company will have to correct the mistake only in the Customers table, and the Orders will still be referring to the correct person.

This leads us to a closely-related concept, which is Referential Integrity. As we mentioned, the Cust_ID column (foreign key) references the ID column (primary key) of the Customer table. The referential integrity rule states that each value in the referring column (Cust_ID) must exist in the referenced column (ID). In other words, we cannot place an order with Cust_ID = 034 without having a customer in the Customers tables with ID = 034.

However, referential integrity might be violated for example, if we try to delete a row from the customer table (when there are orders made my that customer). Databases have multiple ways of ensuring that referential integrity still holds. One common way is to prevent (restrict) deletion in this case. For example, if a user tries to delete customer no. 024 from the Customer table, the database will not perform that action, but will give an error message instead. Another common way is to cascade the deletion process. So, in the above example, when the user tries to delete customer no. 024, the database will perform that action, but it will also delete all orders placed by that customer. So in the end, referential integrity still holds. The choice of which strategy to use can be defined by the user, and depends on the context of the data.

December 9, 2007

The Basics (II) - Database Tables

As mentioned in the previous post, the data in a database are organized into tables. A table (also referred to as a relation) is a set of data elements (values) that is organized using a model of horizontal rows and vertical columns. The data in a table is usually all related to some entity type. Each column in the table represents an attribute of that entity, while each row (tuple) represents an instance of that entity. A table has a specified number of columns but can have any number of rows.

Columns (attributes) can be of different types. Possible types include text, numeric, date & time, binary objects, etc. For example, in Figure 1, the “Last name” column usually contains textual data while the “salary” column contains numeric data.

Figure 1 - Employee Table
Figure 1 - Employee Table

Each table usually has one of its columns defined as the primary key (or simply the key) of the table. The primary key can be defined as follows: Given a particular value of that column, one can always identify a single row in that table. For example, assume that the table in figure 1 contains an additional column “ID”, and assume that each employee has a unique ID. Now if the user looks for any ID value, the system can find the single row corresponding to that ID (since no 2 employees can have the same ID). Any column that satisfies this condition is called a candidate key. One of these candidate keys is selected to be the primary key of the table. On he other hand, the column “Salary” is not a candidate key; given a salary value, one cannot identify a single employee with that salary. As a result, the primary key column cannot contain duplicate values. In some cases, however, the primary key is set as a group of columns, instead of a single column. Each of these columns might contain duplicates. However, each combination of values across the whole group of columns is unique. In this case, this primary key is called a compound key or a composite key.

Blog at WordPress.com.