Normalize Postgres Database with Foreign Keys
In this article we will learn to normalize a Postgres database with foreign keys (one-to-many relationships). Before relational databases were developed, we stored data much like you see in spreadsheets, where you typically have duplicate data like customer names for example. Imagine a table called “orders” that has a row for every order. If a customer buys more than once, you would have that customer’s name, address, etc. duplicated every time. This is where relational databases come to the rescue! This merely means we create another table called “users” that has a unique ID for every user in our PostgreSQL database. We then create “one-to-many” table relationships (Foreign Keys) between the orders table and the users table by linking the ID in the users table to a column in the orders table called something like ID_user using a “foreign key”. Now, any time that same user buys a product, the ID representing that user is placed in the ID_user column, instead of their full name. See how this saves space (text vs. integer), as well as potential for typos?
In this article, we’ll explore how to normalize PostgreSQL databases in a variety of situations where one-to-many relationships between tables are useful or even necessary. We’ll use the following structure:
- What? What is normalization? What do one-to-many table relationships look like?
- How? How do we normalize a Postgres database and what are some situations where we would?
- Comprehension of the use of commonplace SQL commands, including
- Knowledge of what index, integer, string, and text are and how they work.
What is normalization?
We’ll build an understanding of normalization by first looking at a data table from pre-normalized times.
|110||Book: Best way to design a database||Susie Query||284 Oracle Ave, Austin, TX||$16.95||2019-10-15|
|185||License: Microsoft Windows 10||Stan Thuman||5633 Python Blvd, Austin, TX||$59.95||2019-10-16|
|186||License: Microsoft Office 365||Stan Thuman||5633 Python Blvd, Austin, TX||$40.00||2019-10-16|
|201||Course: Data Compression Algorithms||Scott McSQL||29 Postgres Help, Austin, TX||$200.99||2019-10-17|
The first thing to notice here is that Stan Thuman ordered two items, so he shows up twice in our table. Unfortunately, in a flat database system, this means quite a bit of information gets replicated; in this case his name and address. Another way data replication would happen here: Imagine if someone else ordered the “License: Microsoft Windows 10” product, then that information, which we see in the “t_product” column, would also be replicated. See how inefficient that method can be? Especially if you consider we are leaving out many columns that would be necessary to an order tracking system, including various product properties such as weight, color, size, etc. And with the user, we’d need attributes like phone number and email address.
Now if we want to normalize that table, the bare minimum would be to turn it into three tables. We recommend the five you see here:
Here’s an image of those five tables, using arrows (“joins” and “foreign keys”) to show relationships between tables, which are always indexes (a special kind of index in this case called a “foreign key”). What is the significance of the direction of the arrows? In this case, the arrows point to the table that is the “dependee” for looking up information. Meaning, the table getting pointed to is the one being depended on to provide data. So, for example, notice the arrow pointing from the “id_user” column in the “cart” table to the “id” column in the “users” table. This relationship fixes the issue we saw above where we have to repeat user information for every order. After the image below, we’ll do more analysis, so you can more deeply understand how normalization benefits your Postgres database efficiency.
Another arrow (“join” or “foreign key”) to understand is the one pointing from the “id_product” column in the “cart” table to the “id” column in the “products” table. This allows us to replace a string of text in our “orders” (now purchases in our new normalized database) table above with an id pointing to the “products” table, so we can massively reduce repetition, which means a reduction in overall database size, reduction of potential typos, and increase in overall efficiency.
The final arrow to understand in the above diagram is the one pointing from the “id_user” column in the “purchases” table to the “id” column in the “users” table. Much like the relationship between “cart” and “users”, this relationship between “purchases” and “users” allows us to remove repetitive user data from the “purchases” table.
Note that we have fleshed out the above relationships diagram with many other fields who’s use may not be immediately obvious:
users.id_session: In many types of applications we have more than one user at a time using the app. We use a “session variable” to track and keep users separate from each other. For example, if a user is added to the “users” table, after adding that user, we want an easy way to determine the new “id” created in that table, we can use “id_session” to find out via a SELECT statement.
cart.id_coupon: Why did we add this column? Two reasons: (1) So you can extend this lesson on your own – call it homework – by creating a table called “coupons” and relating it to both the “cart” table and the “purchase_details” table; and (2) In order to give you a “boost” in building your own shopping cart system, where you may not have thought of coupons as an option for shoppers.
products.id_category: This is another column, like “cart.id_coupon”, where we are giving you some homework (create a table called categories) and a leg up for creating your own future ecommerce solution.
products.id_sub_cat: This is another column, like “products.id_category”, where we are providing you with some homework (create a table called categories_sub) and a boost for creating a e-commerce solution.
- There are types of arrow, otherwise known as “join” and “foreign key”, that we did not study here. Please feel free to continue your learning of this topic by studying the various kind of joins that exist. Stay tuned because we intend to provide more on this topic in the future.
- If you wonder why we prefixed some of our fields, variables, and columns with “i” or “t“? In this article, we used “i” to mean integer and “t” to mean text or string. Here is a short tutorial on that topic.
In this tutorial we learned why how and why to normalize your Postgres with foreign keys (one-to-many relationships) in order to increase accuracy, ease, and efficiency. We also provided a real world ecommerce example to help deepen your understanding of database normalization.
Pilot the ObjectRocket Platform Free!
Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.Get Started