Split Text to Postgres Table

Introduction

This lesson teaches how to split text to a Postgres table. As we lead up to using the INSERT INTO SQL command, we’ll first study PostgreSQL’s Split_Part function that is key to splitting text into an array of items, how PostgreSQL arrays work, as well as how to convert data types using casting.

Postgres Arrays

Arrays are groups of more than one item. An Array can save work and time, increase development ease, and more efficiently utilize resources such as processor, memory, and storage. In Postgres, an array looks like “CPU, GPU, RAM, Motherboard, Case, Power supply” and “2, 4, 6, 8, 10”. We use “Items” to mean each individual item in an array.

This lesson’s examples will teach first all about what a PostgreSQL array is and how it works.

Postgres Array Syntax and Examples

In any given point where data is stored in a PostgreSQL table, where a row and column meet, which we can call many things and will call a “cell” for this article: instead of an integer, decimal, boolean, varchar, etc., we can also have an array, which represents many items within it. Here’s an example of how we create a Postgres table with a few of these data types, including an array:

CREATE TABLE tbl_computer_parts (
    t_name_part text
    , arr_i_clock_speeds DECIMAL[]
)

Analysis

Studying the above SQL, the data type of “text” we used for the column we named “t_name_part” should be familiar. The column we called “arr_i_clock_speeds” is a PostgreSQL array data type. The bracket symbols “[]” tell Postgres the column as an array type where each item in that array is a decimal data type; so it stores numbers. So, for example, if the part is a CPU, it may have two clock speeds, “base” and “turbo”, which can both be stored in arr_i_clock_speeds as [3.6, 4.2].

Let’s put that into action by adding some data to tbl_computer_parts so we can get a more clear mental picture of how it works.

INSERT INTO tbl_computer_parts
    (
    t_name_part
    , arr_i_clock_speeds
    )
VALUES
    (
    'AMD Ryzen 3900X'
    , ARRAY [4.4, 4.8]
    )

FAQ: “Why not just create two decimal columns called something like, “n_clock_speed_base” and “n_clock_speed_turbo”? GREAT question! Answer: Using an array, we can have as many items as we want stored in this column; we are not limited to just two. We are planning for scaleability, in case some parts have three or even more clock speeds associated with the part. Which begs the question, “But then how do we keep track of what each number means? That’s where we might choose to create another column, text in nature, called something like “arr_t_clock_speeds” that might store data like [‘base’, ‘turbo’]. Or, more efficiently, it would store numbers instead of text, where 0 represents base and 1 represents turbo. If we went this route, we’d probably want to create a new table called “tbl_clock_speed_types” and link it with a foreign key to tbl_computer_parts.

Getting back on track: Let’s fast forward as if we have put in more than one row so we can now query the tbl_computer_parts table to see the data:

SELECT
    t_name_part text
    , arr_i_clock_speeds DECIMAL[]
FROM
    tbl_computer_parts

Returns

t_name_partarr_i_clock_speeds
AMD Ryzen 3900X4.4, 4.8
Intel 9900K4.6, 5.0

NOTE: The above results pulled from the arr_i_clock_speeds column in tbl_computer_parts have “{” and “}” brackets that we removed, for increased clarity.

PostgreSQL array index

We will now study how to read/write individual array items in PostgreSQL. We’ll call the number pointing to each array item an index. Here we will tell Postgres we want the first item in the array for every row – giving us base clock speed – by putting a “1” in brackets:

SELECT
    t_name_part text
    , arr_i_clock_speeds[1]
FROM
    tbl_computer_parts;

Query Results

t_name_partarr_i_clock_speeds
AMD Ryzen 3900X4.4
Intel 9900K4.6

Important

Notice above that – unlike how in most systems the first item is a zero, in Postgres arrays, we begin with a “1” to represent the first array item.

Multidimensional arrays in Postgres

So far we have learned how to use a single dimensional array to store as many items as we want along one axis. If you want to store more than one set of values, you use similar syntax and call it a multidimensional array.

PostgreSQL Multidimensional array example

SELECT
    t_name_part text
    , arr_i_clock_speeds[x][y]
FROM
    tbl_computer_parts

For a moment, let’s go in a slightly different direction to see another way of creating PostgreSQL arrays:

DECLARE
    arr_parts TEXT[] := ARRAY['CPU', 'Motherboard', 'GPU'];

Back on track again: What if we have a string that we need to convert into an array? This is where PostgreSQL’s Split_part() function comes in handy!

PostgreSQL Split_Part Syntax and Examples

First, let’s learn how to use PostgreSQL’s Split_part() via a study of its syntax and an example of its use.

Syntax of PostgreSQL Split_Part

SPLIT_PART(text, delimiter, POSITION)

Analysis

  • text: We hand a string of text into the function as the first parameter. This text would probably have commas, semi-colons, dashes, periods, or some other delimiter, as explained below.
  • delimiter: This parameter is the character used to delineate, such as a comma, as mentioned above.
  • position: Which item (counted via delimiters) in that string to return.

Example of PostgreSQL Split_Part

SELECT SPLIT_PART("CPU, Motherboard, GPU", ",", 2);

The above statement returns “Motherboard” because that is the second item in the comma-delineated text string of “CPU, Motherboard, GPU”.

One way to store and retrieve data is to start with a string of text and use the Split_part function to get the parts of that string (text) that we want. One use for this functionality is to pull parts of a date out of a full date. For example, if you convert a date object into a string like “2021-02-23” and you want to extract the month. You would split_part(“2021-02-23”, “,”, 2) to get “02” because it is the second item in that string of text.

For an exercise, look at the data below in tbl_computer_parts:

ID_partd_produced
1002021-02-03
1022021-02-05
1042021-02-07
1062021-02-09
1082021-02-11

If we want to separate year, month, and day into separate columns for every row in tbl_computer_parts, PostgreSQL’s Split_Part function is exactly what we need. Bonus: we’ll also learn how casting works.

SELECT
    ID_part
    , d_produced
    , split_part(d_produced::TEXT, "-", 1)::INTEGER AS i_year
    , split_part(d_produced::TEXT, "-", 2)::INTEGER AS i_month
    , split_part(d_produced::TEXT, "-", 3)::INTEGER AS i_day
FROM
    tbl_computer_parts;

Results

ID_partd_producedi_yeari_monthi_day
1002021-02-03202123
1022021-02-05202125
1042021-02-07202127
1062021-02-09202129
1082021-02-112021211

Analysis

Examining the second row, where ID_part is 102. If you look at the value in the d_produced column, you see “2021-02-05”. The delineator here is a dash symbol. So, to retrieve the year, we know it is the first item in that string of text. This is where we use the split_part function. It requires a string (text) to work with and d_producted is a date type. So that is where we “cast”, which is the part in our SQL where you see “::TEXT”. That “casting” converts the entire d_produced column into text. The next parameter is a dash. And for the final parameter, we put a 1 to designate the first item in our little array of items. Finally, we cast the result as an integer so that we can place it in the i_year column which was created as an Integer data type. We do the same to extract month and day from that same d_produced column.

Conclusion

In this lesson, we learned how to split text into PostgreSQL tables. Along the way we learned about PostgreSQL arrays, the INSERT INTO command, and the Split_Part function, which is a key component for splitting text into an array of items. We used a real world scenario of storing and retrieving computer parts in order to fully understand how text splitting works in PostgreSQL.

Pilot the ObjectRocket Platform Free!

Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.

Get Started

Keep in the know!

Subscribe to our emails and we’ll let you know what’s going on at ObjectRocket. We hate spam and make it easy to unsubscribe.