Postgres Substring and Position Functions in SQL

Introduction

In this article, we will learn to use the Postgres Substring and Position functions SQL when manipulating strings to isolate words in multi-word strings, including:

  • What? What do the SUBSTRING and POSITION functions do?
  • How? How do we use these functions in our SQL execution?
  • Why? When do we need these functions? A real world use case is included.
  • Extra We’ll also take a brief look at the related function, REVERSE.

Prerequisites

  • Basic understanding of how to write SQL with PostgreSQL (or MS SQL Server, Oracle, MySQL, etc.), using the free PG Admin tool or some other relational db administration tool, or by via code or script (compiled or not, object-oriented or not) languages like Python, PHP, Java, C#, ASP.Net, Javascript, VB.Net, Ruby, Node, etc.) that allows a connection to your db as well as a method for sending it PL/SQL commands or requests, to query or make changes to a database.
  • Knowledge of the use of the most basic SQL (PL/SQL) statements, including SELECT and FROM.
  • Understanding of what a string is and what concatenation means.

What is the SUBSTRING() function?

The PostgreSQL SUBSTRING function returns a portion of a string. So if you seed the function with something like, “PostgreSQL rocks,” and tell it you want the first 10 characters of that string, the SUBSTRING() function will return “PostgreSQL”.

How do we use this function?

Syntax

t_string_portion = SUBSTRING(t_string_original, i_position_begin, i_characters_to_use);

So, if we want to copy a chunk out of one string into another, we feed the SUBSTRING function three tasty things:

  • The original string, which we call “t_string_original,” above.
  • The position (from left to right only) that we want to begin grabbing characters. We call this “i_position_begin,” in the syntax section above. If we want to start on the right and count left, there is a way to do that, which we’ll talk about later.
  • Finally, we tell the function how many characters to use or grab; which we call “i_characters_to_use” above.

Here’s an image that makes the syntax super clear:

![Postgres syntax for Substring SQL](https://i.gyazo.com/3ff4dceb6c884264c22407e912c6b3f6.png “Postgres syntax for Substring SQL”)

Now we take a small step by filling our syntax with a bit of data:

DECLARE
t_string_original := 'Python is the best coding language.';
BEGIN
  SELECT
  t_string_portion = SUBSTRING(t_string_original, 1, 6);
END

The example above returns “Python” which is a string of the first 1 thru 6 characters of our original string.

Let’s look at a slightly deeper example in SQL of how SUBSTRING works. Here, we want to mine our database for the first word in a value stored in a column that may have multiple words.

Our test table with already filled-in data:

id_techt_namet_category
0OracleDatabase Relational Monolithic
1MongoDatabase NoSQL Flexible
2PostgreSQLDatabase Relational Free Best
3MySQLDatabase Relational Free
4PythonLanguage Newish Flexible Best
5PHPLanguage Old Pervasive
6JavaLanguage Pervasive

What if we want to return the first word in the tcategory field (column) of our database? We _could use the LIKE and “%” operators but we’re here to learn SUBSTRING. Also, we want the first word and we can’t assume we know what that word is, so this is where the LIKE and “%” method breaks down. Here’s the code:

SELECT
  SUBSTRING(t_category, 1, 8) AS t_category_type
FROM
  technologies;

The above query will return the following results:

t_category_type
Database
Database
Database
Database
Language
Language
Language

Now that we have a basic understanding of how the SUBSTRING function is used, we can get even more “real world” with our training.

Why use the SUBSTRING function?

Use case of categorizing database and coding technologies

Again, here is our full table. We’ll name it “technologies”:

id_techt_namet_category
0OracleDatabase Relational Monolithic
1MongoDatabase NoSQL Flexible
2PostgreSQLDatabase Relational Free Best
3MySQLDatabase Relational Free
4PythonLanguage Newish Flexible Best
5PHPLanguage Old Pervasive
6JavaLanguage Pervasive

The initial thing you may have noticed about our first example that returns only one column of data, either “Database” or “Language” and that data is useless to us if not combined with at least an id or name of what tech is being referred to. So we’ll take baby steps and modify that query:

Side note: If you are familiar with normalization, you might look at the above table and think to yourself, “I’d definitely normalize this table by creating a new table called ‘categories’, and link it to this table via a new field in this table called id_category. But that is a topic for another article. We want to keep this as simple and readable as possible.

The new query that gives us the information we need:

SELECT
  id_tech
  , t_name
  , SUBSTRING(t_category, 1, 8) AS t_category_type
 FROM
  technologies;

Here are the recordset of results Postgres returns to us, giving us characters 1 through 8 of the t_category field (column), results renamed to be “t_category_type” by the “AS” operator you see above:

id_techt_namet_category_type
0OracleDatabase
1MongoDatabase
2PostgreSQLDatabase
3MySQLDatabase
4PythonLanguage
5PHPLanguage
6JavaLanguage

Now let’s throw a twist into the mix so we can see how changing the start position to a negative number gives us a begin point at the end of our source data. In this case our reason for doing this is because – again using the “tcategory” column (field) – this time we want to extract the _last word in that phrase.

For your ease, we’ll look again at a return of the full dataset:

id_techt_namet_category
0OracleDatabase Relational Monolithic
1MongoDatabase NoSQL Flexible
2PostgreSQLDatabase Relational Free Best
3MySQLDatabase Relational Free
4PythonLanguage Newish Flexible Best
5PHPLanguage Old Pervasive
6JavaLanguage Pervasive

Before we dive into the SQL script for extracting the last word from that category column, let’s walk through the logic:

  • When we study the functionality of the STRPOS() function, we find there is no way to tell it to count fromt the end of our source string, so we need to think of another way to accomplish our task.
  • Two of the most popular ways to accomplish this task are (1) Using the REVERSE() function and (2) Splitting the source string into an ARRAY. To keep this article as easy, focused, and brief as possible, we’ll show you one of those ways:

Using REVERSE()

A quickie of the syntax of the REVERSE function in Postgres:

t_reversed = REVERSE(t_phrase);

So, using the above syntax, if the value stored in tphrase is “Database Relational Free” because we are right now just studying the row that is identified as id_tech of 3, then t_reversed is filled with “eerF lanoitaleR esabataD”. How does _that help us accomplish our task?

  • We now have the word we are looking for, which is the last word in our original data, i.e., “Free” but yeah, it’s reversed to be “eerF”.
  • The important part is that we have it now as our first word, so we can easily get it with our SUBSTRING function. THEN we can REVERSE it again to make it intelligable. Make sense? Don’t worry. We are going to take it step by step and explain every piece of that logic.

Now we can build our query to extract that last word from the t_category field (column) in our Postgres database:

SELECT
  id_tech
  , t_name
  , REVERSE(t_category) AS t_reversed
  , POSITION(" " IN t_reversed) AS i_position_of_space
  , SUBSTRING(t_category, 1, i_position_of_space - 1) AS t_category_last_word_reversed
  , REVERSE(t_category_last_word_reversed) AS t_category_last_word
 FROM
  technologies;

In case the above code seems confusing, let’s break it down:

(1) REVERSE(t_category) as t_reversed creates a temporary variable called “t_reversed”. The REVERSE() function converted “Database Relational Free” into “eerF lanoitaleR esabataD” and stored that string in “t_reversed”.

(2) POSITION(" " IN t_reversed) as i_position_of_space creates a temp var called “i_position_of_space” and stores a number there, which in this case is “5” because in our t_reversed string of “eerF lanoitaleR esabataD”, the first space found by our POSITION() function is at character number 5. So now “5” (as an integer) is stored in “i_position_space”.

(3) SUBSTRING(t_category, 1, i_position_of_space - 1) AS t_category_last_word_reversed stores that first word, “eerF” in a new temp variable called “t_category_last_word_reversed”. So now we have our last word isolated but in reverse of what we need.

(4) REVERSE(t_category_last_word_reversed) AS t_category_last_word creates a temporary variable called “t_category_last_word”. The REVERSE() function converted “eerF” into “Free” and stored that string as “t_category_last_word”.

Notes

  • In many situations, like arrays, for example, we start counting at zero. So it is important to keep in mind that with SUBSTRING(), we start counting the position with a “1” or “-1” if starting from the end..
  • Not to be confused with the SubStr() function that is used in some database systems and languages. This was part of an older Oracle spec, too.
  • Look for our other articles on the POSITION Postgres function.

Conclusion

In this article we explored how to use the SUBSTRING() function in PostgreSQL. We also learned via a real world example. Finally, we touched on some related functions such as REVERSE() and POSITION().

Pilot the ObjectRocket Platform Free!

Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.

Get Started

Keep in the know!

Subscribe to our emails and we’ll let you know what’s going on at ObjectRocket. We hate spam and make it easy to unsubscribe.