Postgres Substring and Position Functions in SQL
Introduction
In this article, we will learn to use the Postgres Substring and Position functions SQL when manipulating strings to isolate words in multi-word strings, including:
- What? What do the
SUBSTRING
andPOSITION
functions do? - How? How do we use these functions in our SQL execution?
- Why? When do we need these functions? A real world use case is included.
- Extra We’ll also take a brief look at the related function,
REVERSE
.
Prerequisites
- Basic understanding of how to write SQL with PostgreSQL (or MS SQL Server, Oracle, MySQL, etc.), using the free PG Admin tool or some other relational db administration tool, or by via code or script (compiled or not, object-oriented or not) languages like Python, PHP, Java, C#, ASP.Net, Javascript, VB.Net, Ruby, Node, etc.) that allows a connection to your db as well as a method for sending it PL/SQL commands or requests, to query or make changes to a database.
- Knowledge of the use of the most basic SQL (PL/SQL) statements, including
SELECT
andFROM
. - Understanding of what a string is and what concatenation means.
What is the SUBSTRING() function?
The PostgreSQL SUBSTRING
function returns a portion of a string. So if you seed the function with something like, “PostgreSQL rocks,” and tell it you want the first 10 characters of that string, the SUBSTRING() function will return “PostgreSQL”.
How do we use this function?
Syntax
1 | t_string_portion = SUBSTRING(t_string_original, i_position_begin, i_characters_to_use); |
So, if we want to copy a chunk out of one string into another, we feed the SUBSTRING
function three tasty things:
- The original string, which we call “t_string_original,” above.
- The position (from left to right only) that we want to begin grabbing characters. We call this “i_position_begin,” in the syntax section above. If we want to start on the right and count left, there is a way to do that, which we’ll talk about later.
- Finally, we tell the function how many characters to use or grab; which we call “i_characters_to_use” above.
Here’s an image that makes the syntax super clear:

Now we take a small step by filling our syntax with a bit of data:
1 2 3 4 5 6 | DECLARE t_string_original := 'Python is the best coding language.'; BEGIN SELECT t_string_portion = SUBSTRING(t_string_original, 1, 6); END |
The example above returns “Python” which is a string of the first 1 thru 6 characters of our original string.
Let’s look at a slightly deeper example in SQL of how SUBSTRING
works. Here, we want to mine our database for the first word in a value stored in a column that may have multiple words.
Our test table with already filled-in data:
id_tech | t_name | t_category |
---|---|---|
0 | Oracle | Database Relational Monolithic |
1 | Mongo | Database NoSQL Flexible |
2 | PostgreSQL | Database Relational Free Best |
3 | MySQL | Database Relational Free |
4 | Python | Language Newish Flexible Best |
5 | PHP | Language Old Pervasive |
6 | Java | Language Pervasive |
What if we want to return the first word in the tcategory field (column) of our database? We _could use the LIKE and “%” operators but we’re here to learn SUBSTRING
. Also, we want the first word and we can’t assume we know what that word is, so this is where the LIKE and “%” method breaks down. Here’s the code:
1 2 3 4 | SELECT SUBSTRING(t_category, 1, 8) AS t_category_type FROM technologies; |
The above query will return the following results:
t_category_type |
---|
Database |
Database |
Database |
Database |
Language |
Language |
Language |
Now that we have a basic understanding of how the SUBSTRING
function is used, we can get even more “real world” with our training.
Why use the SUBSTRING function?
Use case of categorizing database and coding technologies
Again, here is our full table. We’ll name it “technologies”:
id_tech | t_name | t_category |
---|---|---|
0 | Oracle | Database Relational Monolithic |
1 | Mongo | Database NoSQL Flexible |
2 | PostgreSQL | Database Relational Free Best |
3 | MySQL | Database Relational Free |
4 | Python | Language Newish Flexible Best |
5 | PHP | Language Old Pervasive |
6 | Java | Language Pervasive |
The initial thing you may have noticed about our first example that returns only one column of data, either “Database” or “Language” and that data is useless to us if not combined with at least an id or name of what tech is being referred to. So we’ll take baby steps and modify that query:
Side note: If you are familiar with normalization, you might look at the above table and think to yourself, “I’d definitely normalize this table by creating a new table called ‘categories’, and link it to this table via a new field in this table called id_category. But that is a topic for another article. We want to keep this as simple and readable as possible.
The new query that gives us the information we need:
1 2 3 4 5 6 | SELECT id_tech , t_name , SUBSTRING(t_category, 1, 8) AS t_category_type FROM technologies; |
Here are the recordset of results Postgres returns to us, giving us characters 1 through 8 of the t_category field (column), results renamed to be “t_category_type” by the “AS” operator you see above:
id_tech | t_name | t_category_type |
---|---|---|
0 | Oracle | Database |
1 | Mongo | Database |
2 | PostgreSQL | Database |
3 | MySQL | Database |
4 | Python | Language |
5 | PHP | Language |
6 | Java | Language |
Now let’s throw a twist into the mix so we can see how changing the start position to a negative number gives us a begin point at the end of our source data. In this case our reason for doing this is because – again using the “tcategory” column (field) – this time we want to extract the _last word in that phrase.
For your ease, we’ll look again at a return of the full dataset:
id_tech | t_name | t_category |
---|---|---|
0 | Oracle | Database Relational Monolithic |
1 | Mongo | Database NoSQL Flexible |
2 | PostgreSQL | Database Relational Free Best |
3 | MySQL | Database Relational Free |
4 | Python | Language Newish Flexible Best |
5 | PHP | Language Old Pervasive |
6 | Java | Language Pervasive |
Before we dive into the SQL script for extracting the last word from that category column, let’s walk through the logic:
- When we study the functionality of the STRPOS() function, we find there is no way to tell it to count fromt the end of our source string, so we need to think of another way to accomplish our task.
- Two of the most popular ways to accomplish this task are (1) Using the REVERSE() function and (2) Splitting the source string into an ARRAY. To keep this article as easy, focused, and brief as possible, we’ll show you one of those ways:
Using REVERSE()
A quickie of the syntax of the REVERSE
function in Postgres:
1 | t_reversed = REVERSE(t_phrase); |
So, using the above syntax, if the value stored in tphrase is “Database Relational Free” because we are right now just studying the row that is identified as id_tech of 3, then t_reversed is filled with “eerF lanoitaleR esabataD”. How does _that help us accomplish our task?
- We now have the word we are looking for, which is the last word in our original data, i.e., “Free” but yeah, it’s reversed to be “eerF”.
- The important part is that we have it now as our first word, so we can easily get it with our
SUBSTRING
function. THEN we canREVERSE
it again to make it intelligable. Make sense? Don’t worry. We are going to take it step by step and explain every piece of that logic.
Now we can build our query to extract that last word from the t_category field (column) in our Postgres database:
1 2 3 4 5 6 7 8 9 | SELECT id_tech , t_name , REVERSE(t_category) AS t_reversed , POSITION(" " IN t_reversed) AS i_position_of_space , SUBSTRING(t_category, 1, i_position_of_space - 1) AS t_category_last_word_reversed , REVERSE(t_category_last_word_reversed) AS t_category_last_word FROM technologies; |
In case the above code seems confusing, let’s break it down:
(1) REVERSE(t_category) as t_reversed
creates a temporary variable called “t_reversed”. The REVERSE() function converted “Database Relational Free” into “eerF lanoitaleR esabataD” and stored that string in “t_reversed”.
(2) POSITION(" " IN t_reversed) as i_position_of_space
creates a temp var called “i_position_of_space” and stores a number there, which in this case is “5” because in our t_reversed string of “eerF lanoitaleR esabataD”, the first space found by our POSITION() function is at character number 5. So now “5” (as an integer) is stored in “i_position_space”.
(3) SUBSTRING(t_category, 1, i_position_of_space - 1) AS t_category_last_word_reversed
stores that first word, “eerF” in a new temp variable called “t_category_last_word_reversed”. So now we have our last word isolated but in reverse of what we need.
(4) REVERSE(t_category_last_word_reversed) AS t_category_last_word
creates a temporary variable called “t_category_last_word”. The REVERSE() function converted “eerF” into “Free” and stored that string as “t_category_last_word”.
Notes
- In many situations, like arrays, for example, we start counting at zero. So it is important to keep in mind that with SUBSTRING(), we start counting the position with a “1” or “-1” if starting from the end..
- Not to be confused with the
SubStr()
function that is used in some database systems and languages. This was part of an older Oracle spec, too. - Look for our other articles on the
POSITION
Postgres function.
Conclusion
In this article we explored how to use the SUBSTRING()
function in PostgreSQL. We also learned via a real world example. Finally, we touched on some related functions such as REVERSE()
and POSITION()
.
Pilot the ObjectRocket Platform Free!
Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.
Get Started