Relations and Schema in Relational Databases

Relational Databases are the most common databases in use today. Since relational databases are made of tables, people often get the impression that they are like a less visual form of an Excel spreadsheet. While they may look similar, the similarities pretty much stop at appearance. When it comes to databases, the relational database is the heavyweight champ an Excel unskilled novice flyweight.

A spreadsheet, if being used as a database, would be considered a flat database; a database consisting of a single table of information. A row, or record, contains all the information about a single entry. Each row intersects a column, dividing the row into fields for each type, or piece, of information held in the row. Since I work in mining, we will consider some hypothetical assay information for a drillhole. The project ID, the drillhole number, from, to, assay type, element, and element value each occupy a field. Such a table would look something like this:

It is pretty easy to look up information in this table. We can quickly tell what elements were assayed, the drillhole ID, the From-To interval, etc. However, this flat database has a problem common to all flat databases: redundancy. Everything is the same with exception of the Element and Element_Value fields. This is inefficient and wastes storage space. A better way to store this data would be to make a relational database from a series of tables.

Project Table

Drillhole Table

Sample Table

By creating these three tables, none of the data are repeated. We have established relationships between the Project Table and the Drillhole Table using the ProjectCode and between the Drillhole Table and the Sample Table through the DrillHHoleID. These ProjectCode, the DrillHole ID, and the SampleID are Primary Keys which are unique values that identify each of these records. Primary keys enforce referential integrity within the database and ensure there are no duplicate records. These relationships are how relational databases get their name.

This relational scheme has two advantages: data storage efficiency and error risk reduction. The data storage savings in negligible in this small example but you can imagine the amount of storage space saved if this were a mining company and there were multiple projects and thousands of drillholes at each one, with each drillhole having 100 – 200 assays for 31 elements. That can really add up. The potential for errors is reduced since information is stored only once, and definitions can be set such that certain fields cascade, or auto populate when entered in one table.

Relationship Models and Schema

The power of relational databases lies in their strict table definitions and for this reason it is important to think about the data you want to store and the relationships between the data. These relationships are defined by a schema, or model, that will provide the structure of your database.

There are three basic types of relationships in relational databases:

one-to-one
one-to-many
many-to-many

It is important to consider your data in these relationships in order design the most effective database possible. A helpful way to think about these relationships is by considering a product sales order database example. In this scenario, each type of relationship would look like the following:

one-to-one – a customer has a single, unique customer ID
one-to-many – a customer may have many orders or transactions
many-to-many – a customer’s order may containing more than one product and a product can be in many orders.

The differences in these relationships requires different structures to ensure efficiency.

One – to -one

The customer ID should be with other descriptive information about the customer, such as address, phone number, and credit card information. This information should be kept together with the exception of certain cases when there are large chunks of infrequently used data, differing security requirements for something like the credit cards, or if there is customer information that might lend to better efficiency if it is separated from other customer data.

One-to-many

This is more easily thought of as many-to-one. In our example think about many orders belonging to one customer; or, one customer having many orders. However it works best for your style of thinking. Customer orders would be tied to purchase numbers by assigning a customer ID to a transaction number. This information would then be connected to the customer name by the customer ID.

Many – to -many

This kind of relationship is best described as multiple many-to-one relationships. Think about this as a collection of customers, a collection of orders, and a collection of products. In this line of thinking, we would create an entity that connects a customer with a transaction and a transaction with products. This associates one customer with one order through multiple instances of many-to-one relationship pairs.

Considerations

Relational databases are best used when you have a good understanding of the types of data and the types of relationships discussed above are well understood. They are rigid structures and that is one of their strengths. However, sometimes you don’t know what you don’t know and then a NoSQL, or non-relational database, may be a better choice.

If this brief overview of relational databases and schema has been helpful to you, please comment below or find me on Twitter.

Image credit