Python Word Pattern Search for Postgres

Introduction

In this article we will go over array (list) and string manipulation techniques in Python to word pattern search for Postgres data extraction. This is how the lesson will go:

  • How? We will learn how to leverage Python’s powerful string and list handling to search for patterns of words to return related info from PostgreSQL queries by removing a list of “stop words” from a user’s input.
  • What else? We’ll use the Postgres “SELECT” command, “WHERE” clause, and “LIKE”; create Python functions; and use some of Python’s built-in functions for replace, error trapping, and SQL execution (from the “psycopg2” flask library).

Prerequisites

  • SQL: Basic knowledge of the basics of writing SQL for Postgres. We use dBeaver’s free software for increased efficiency and ease.
  • Python: We’re using Python 3.8. Get the latest Python version here.
  • Tutorial on naming conventions showing why you may want to prefix your variables, column names, table names, etc. as you see done in this article. For example, naming “tvariable” with the “t” prefix to define it as a “text” (string) object and “tbl_” before table names in order to clearly distinguish those objects as tables. The tutorial goes a bit deeper, as well, talking about how to name variables based on a hierarchy system.

Before we get into it, we’ll go over a relevant foundational concept; how to create and use a Python list.

What is a Python List and how does it work?

Short answer: Lists are like Python arrays but with increased flexibility in data type handling.

In Python a list is a group of values (items). Lists can aid in increasing efficiency, as they allow us to easily store, reference, add, remove, and sort groups of values, much like in a Postgres database table. “42, -5, -12, 122” is a list of integers. “Python, Search, Pattern, Postgres, a.i., Words, Pants” is a list of strings. In some programming languages, lists have to contain items where every item in the list must be the same type. In Python’s case, you can have mixed types, like this: “5, Python, -1, Search, 12, Word”. Most coders use the word “Item” to describe individual values in a list. Python lists are a datatype that also have multidimension functionality. We will focus here on one dimensional lists.

list_words = ["Python", "Word", "Search", "Pattern"]

The above code simultaneously creates a list called “list_words” and fills the list with four items. How do we get data from the above Python list?

t_word = list_words[2]
print(t_word)

The value stored in “t_word” becomes “Search” because the count begins at zero for lists. So, unlike how the index begins at “1” in a Postgres Array, it begins at “0” for Python lists. “2” here means the third item in the list. That number is often called “index”, which corresponds nicely to typical coder understanding of programming concepts in other languages.

Let’s loop through the list to see all the values:

list_words = ["Python", "Word", "Search", "Pattern"]
for t_word in list_words
    print(t_word)

Analysis

  • for: We use the Python “for” loop to cycle through every value in “list_words”, setting the value of a local variable called “t_word” to temporarily store the current item in that variable.
  • print: Python’s “print()” function provides output of the value in parens, in this case “t_word”.

The results from the above Python script:

Python
Word
Search
Pattern

Now that we understand Python lists, let’s do a quick lesson on Python’s replace function. If you already understand how to use replace() from Python’s built-in functions, feel free to skip down to creating a “stop word” list with Python.

How the REPLACE function works

Syntax

t_string_source.REPLACE(t_string_to_replace, t_string_replacement, [i_count])

Parameters: – t_string_source: The string we want to search through in order to find and replace a substring within that source text. – t_string_to_replace: The string we are looking for that is somewhere within the source string. – t_string_replacement: The string we want to use to replace the above “string to replace”. – i_count: Optional: This number determines number of replacements to make, starting from the left. If left empty, default is “all”.

Example of use of the Python Replace() method

t_string_source = "Python Word Pattern Search for Postgres"
t_string_to_replace = "Word"
t_string_replacement = "Value"
t_string_source.REPLACE(t_string_to_replace, t_string_replacement)
Print (t_string_source)

The result here should be “Python Value Pattern Search for Postgres”

Let’s move on to setting up a Python list and filling it with “stop words”.

Create LIST in Python and fill with stop words

The first step is to create a list with our “stop words” in it:

list_stop_words = ["a" , "about" , "above" , "after" , "again" , "against" , "all" , "am" , "an" , "and" , "any" , "are" , "as" , "at" , "be" , "because" , "been" , "before" , "being" , "below" , "between" , "both" , "but" , "by" , "could" , "did" , "do" , "does" , "doing" , "down" , "during" , "each" , "few" , "for" , "from" , "further" , "had" , "has" , "have" , "having" , "he" , "he'd" , "he'll" , "he's" , "her" , "here" , "here's" , "hers" , "herself" , "him" , "himself" , "his" , "how" , "how's" , "i" , "i'd" , "i'll" , "i'm" , "i've" , "if" , "in" , "into" , "is" , "it" , "it's" , "its" , "itself" , "let's" , "me" , "more" , "most" , "my" , "myself" , "nor" , "of" , "on" , "once" , "only" , "or" , "other" , "ought" , "our" , "ours" , "ourselves" , "out" , "over" , "own" , "same" , "she" , "she'd" , "she'll" , "she's" , "should" , "so" , "some" , "such" , "than" , "that" , "that's" , "the" , "their" , "theirs" , "them" , "themselves" , "then" , "there" , "there's" , "these" , "they" , "they'd" , "they'll" , "they're" , "they've" , "this" , "those" , "through" , "to" , "too" , "under" , "until" , "up" , "very" , "was" , "we" , "we'd" , "we'll" , "we're" , "we've" , "were" , "what" , "what's" , "when" , "when's" , "where" , "where's" , "which" , "while" , "who" , "who's" , "whom" , "why" , "why's" , "with" , "would" , "you" , "you'd" , "you'll" , "you're" , "you've" , "your" , "yours" , "yourself" , "yourselves"]

Write a Function to remove stop words from a string

def remove_stop_words(t_source):
    for t_what_to_replace in list_stop_words:
        t_source.replace(t_what_to_replace, "")
    t_source.replace("  ", " ")

Analysis

  • def: Creating a function called “remove_stop_words” we can call as often as needed from our main application.
  • for: Looping through every item (“t_what_to_replace”) in our list, “list_top_words”.
  • t_source.replace: For each stop word, remove (replace) it from our primary search phrase (“t_source”).
  • t_source.replace(” “, ” “): At the end, remove any extra spaces. See, if you remove “a” from “You are a human”, then you are left with “you are human”, so that is why we remove extra spaces.

Create test table in Postgres to search against

To begin, we will build a test table in to use for saving user activity in a PostgreSQL database.

CREATE TABLE public.tbl_stuff (
    id serial NOT NULL,
    id_session int4 NULL DEFAULT 0,
    t_content VARCHAR(4096) NULL,
    d_when DATE NULL DEFAULT now(),
    CONSTRAINT tbl_stuff_pkey PRIMARY KEY (id)
);
CREATE UNIQUE INDEX tbl_stuff_id_idx ON public.tbl_stuff USING btree (id);

Note: After creating the table, if the default value for the id column wasn’t set to “nextval(‘tbl_stuff_id_seq’::regclass)“, copy this value into the “id” field. We created that column to be an auto-increment indexed column so that every time we insert a row, the “id” column will increase by one and always be a unique index for “tbl_stuff”.

Next we will write the SQL needed to search for and retrieve data from the “tbl_stuff” table:

Create SQL for search

    s = ""
    s += "SELECT t_content FROM tbl_stuff"
    s += "WHERE"
    s += "("
    s += " t_content LIKE (%t_input)"
    s += ")"
    db_cursor.execute(s, [t_input])
    dbRow = cur.fetchone()
    ID_product = dbRow(0)

Analysis: As you can see, we are using parameterized SQL to SELECT records from the “tbl_stuff” table based on the value in our local “t_input” variable.

Now that we understand the primary parts required to build our “Python Word Pattern Search for Postgres” application, we can look at the full source code listing:

Source code of the Python app for searching Postgres

from flask import Flask
import psycopg2 # for database connectivity
from flask import render_template # to render the error page

app = Flask(__name__)
@app.route("/")

# database credential setup
t_host = "PostgreSQL database host address" # either "localhost", a domain name, or an IP address.
t_port = "5432" # default postgres port
t_dbname = "database name"
t_user = "database user name"
t_pw = "password"
db_conn = psycopg2.connect(host=t_host, port=t_port, dbname=t_dbname, user=t_user, password=t_pw)
db_cursor = db_conn.cursor()

def setUpMainVariables():
    t_input = "Find me a red doggie"
    # in a real application, this variable would not be set up
    #    like a constant as you see here.

def remove_stop_words(t_source):
    # Loop through stop words list
    for t_item in list_stop_words:
        t_source.replace(t_item, "")
    # Replace all instances of two spaces with one space.
    t_source.replace("  ", " ")

def search_db(t_input):
    s = ""
    s += "SELECT t_content FROM tbl_stuff"
    s += "WHERE"
    s += "("
    s += " t_content LIKE (%t_input)"
    s += ")"
    try:
        db_cursor.execute(s, [t_input])
        dbRow = cur.fetchall()
        # Next (not shown here): Show the data in dbRow to your user(s)
    except psycopg2.Error as e:
        t_message = "Postgres Database error: " + e + "/n SQL: " + s
        return render_template("error.html", t_message = t_message)
    db_cursor.close

# Main application
def main():
    setUpMainVariables()
    remove_stop_words(t_input)
    search_db()

# Call main application
main()

if __name__ == "__main__":
    app.run(debug = True)

Conclusion

In this article we learned how to build an application using word pattern search for Postgres, primarily to refine user input by removing “stop words” before using PostgreSQL’s “SELECT”, “WHERE”, and “LIKE” to get the data you need. In addition, we saw how to create and use Postgres tables, Python functions, and how to use Python to detect and branch from “try” error checking. Finally, we shared all the Python source code for this project.

Pilot the ObjectRocket Platform Free!

Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.

Get Started

Keep in the know!

Subscribe to our emails and we’ll let you know what’s going on at ObjectRocket. We hate spam and make it easy to unsubscribe.