Join Tables in Postgres with Python
In this article, we will learn how to join tables in Postgres using Python to write the SQL. During this tutorial, we’ll use the following structure:
- What? What does a table join in PostgreSQL do and what is the syntax?
- How? How, why, and when do we best use the JOIN clause in our PostgreSQL SQL commands?
- Basic of understanding of how to write SQL for Postgres or similar relational databases like MS SQL Server, Oracle, and MySQL because SQL is similar across all these technologies.
- Comprehension of the use of common SQL statements, including
What is Normalization
Before we dive in to learning how to join tables in Postgres using Python, we’ll take a brief tour of database normalization and why it is important.
Prior to the development of relational databases, we managed data in a manner similar to spreadsheets, where you would often end up having duplicate data. How? Think of a table called “purchases” which includes a record for every order. When a customer buys more than once, you get that customer’s name, phone, etc. duplicated for each order. This is where a switch to the relational database (RDB) model shines. With an RDB we create a table called “users” that has a unique ID for every user in our PostgreSQL database. We set up “one-to-many” table relationships using “Foreign Keys” to connect the orders table and the users table by relating the ID in the users table to a column in the orders table called something like ID_user. Whenever that same user buys a product, the ID representing that user is placed in the ID_user column of the “Orders” table, instead of their full name.
|112||Book: Relational Databases||Frank Query||284 Postgres Ave, Austin, TX||$15.25||2019-11-05|
|184||License: MS Windows||Fran Chuman||5633 Python Blvd, Austin, TX||$59.95||2019-11-06|
|185||License: MS Office||Fran Chuman||5633 Python Blvd, Austin, TX||$40.00||2019-11-06|
|203||Course: Postgres Table Joins||Jim McSQL||21 Postgres Help, Austin, TX||$180.99||2019-11-07|
The first thing to notice here is that Fran ordered two products, so she shows up twice in the table above. In a flat database system like you see here, quite a bit of data gets replicated; in this case her name and address. Another data replication issue: Imagine if someone else ordered the “License: MS Windows” item, then that information, which we see in the “t_product” column, might also be replicated. See the problems with this method?
If we want to normalize that table, the minimum would be to turn our flat “purchases” table into three tables.
Here’s an image of three tables, using arrows to represent “joins” to show relationships between tables, which are indexed. What is the significance of arrow direction? The arrows point to the table that is the “dependee” for data look-up; the table getting pointed at is the one being depended on to provide data. Notice here that the arrow pointing from the “id_user” column in the “cart” table points to the “id” column in the “users” table. This link fixes our problem above with flat databases where we repeat user information for every order.
(https://gyazo.com/460c46ab1921c14793561eeba16d8338 “Join Tables in Postgres”)
The next arrow to study from the image above is the one pointing from the “id_user” field in the “purchases” table to the “id” column in “users”. This relationship between “users” and “purchases” keeps repetitive user data out of our “purchases” table.
Next, we’ll look at a powerful relational database construct called “nested select” that can often be used instead of JOIN, so it is important to understand both ways.
What is a nested select?
Sometimes we may wish to acquire data from a subset of records we build on the fly, insert into a table, or update a table using a data subset. These are often called “subqueries”.
Let’s study the syntax of a nested select used in the WHERE clause of a query:
Syntax of “nested select” in the WHERE clause:
column_1 IN -- can be "IN", "NOT IN", "EXISTS, "=", "<", and more.
ORDER BY column_1
In the example above, we examine two data sets that are both pulled from the same table, “tbl_data”. The high level view is that the “inner” nested data set runs first and the “outer” query is limited based on the results of that inner, nested query returns. The IN you see here is important. It’s telling Postgres to retrieve only records from tbl_data where the value in column_1 exists in the data returned by the nested (INNER) query.
Syntax of “nested select” in the FROM clause:
) AS nested
In this example, we gave a name to the “inner” nested select query, “nested”, so as to refer to it from the outer select query at the beginning of the query. Note that we enclosed the nested select query in parenthesis, which is required.
Now that we have a basic understanding of both normalization and nested queries, we can move on to look at how and why we would join tables for Postgres.
We use a JOIN in Postgres to combine records from more than one table by using values that are common to each table. The types of ways we can join tables in Postgres are CROSS JOIN, INNER JOIN, LEFT OUTER JOIN, RIGHT OUTER JOIN, and FULL OUTER JOIN. We’re going to focus here on INNER JOIN, which is the most oft-used in database development.
First, “INNER JOIN” syntax:
SELECT tbl_1.col_1, tbl_2.col_2, etc.
INNER JOIN tbl_2
ON tbl_1.col_in_common = tbl_2.col_in_common;
- SELECT: Determines which columns are returned from the two tables we are using in this example.
- FROM: One of the tables we are combining via INNER JOIN.
- INNER JOIN: The name of the other table we are working with.
- ON: Crucial! Telling Postgres which fields from each table are the common link. Usually these are indexed.
Next, let’s get more “real world” and set up two tables with test data. Our first table, “tbl_technologies”, will be the table getting “referenced” from the second table.
Now for the table that will be referenced via INNER JOIN. We’ll call it “tbl_users”:
Study the above tables. Notice the field that will link the two? Yes, it’s the “id” field in “tbl_technologies” linking to (join) the “id_tech” field in the “tbl_users” table. The high level goal is to print a list of technologies that each person specializes in, next to their name. Let’s write a query to do this, so we can see how INNER JOIN work to join tables in PostgreSQL with Python.
s = ""
s += "SELECT tbl_users.t_name_user, tbl_technologies.t_name_tech"
s += " FROM tbl_users"
s += " INNER JOIN tbl_technologies"
s += " ON tbl_users.id_tech = tbl_technologies.id;"
- SELECT: Here we are asking for PostgreSQL to return t_name_user and t_name_tech, each from a different table.
- FROM … INNER JOIN: Pointing out the two tables being used.
- ON: Setting the “link” or “relationship” between the two tables to be the “id_tech” field in “tbl_users” and the “id” field in “tbl_technologies”.
Voila! You should now have a good understanding of how to join two tables using common fields. The exact same principle applies whether you are joining two or five tables.
- If you wonder why we prefixed some of our fields, variables, and columns with “i“, “t” or “tbl“? In this article, we used “i” to mean integer, “t” to mean text or string, and “tbl” to mean table. Here is a short tutorial on that topic.
In this tutorial we learned how to join tables in Postgres with Python. As a prerequisite we learned about the difference between the flat database model and the relational database model. In addition, we studied up on how to do NESTED SELECTs (subqueries) in order to obtain similar goals as joining tables. We also created a “realish” example to help round out understanding of how table joining in Postgres works.
Pilot the ObjectRocket Platform Free!
Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.Get Started