Word Search for CockroachDB

Have a Database Problem? Speak with an Expert for Free
Get Started >>

Introduction

In this tutorial document we will go over array (list) and string manipulation techniques in Python to word search for CockroachDB data extraction. This is how the lesson will go:

  • We will learn how to leverage Python’s powerful string and list handling to search for patterns of words to return related info from CockroachDB SQL by removing a list of “stop words” from a user’s input.
  • We will use the Cockroach “SELECT” command, “WHERE” clause, and “LIKE”; create Python functions; and use some of Python’s built-in functions for replace, error trapping, and SQL execution (from the “psycopg2” flask library).

Prerequisites

  • SQL: Novice understanding of the basics of writing SQL for Cockroach. We use dBeaver’s free software for increased efficiency and ease of database management.
  • Python: We’re using Python 3.8. Get the latest Python version here.

Before we get into this tutorial, we will go over a relevant foundational concept; how to create and use a Python List, similar to an Array in its functionality.

Python Lists

A Python List is a group of objects. Lists can aid in increasing efficiency, as they allow us to easily store, reference, add, remove, and sort groups of values, much like in a Cockroach database table. “-42, 15, 412, -92” is a list of integers. “Bugs Bunny, Jam, Beans, Toilet paper, a.i., Moopy, Pants” is a list of strings. In some coding languages, every item in the List must be the same type. In Python’s case, you can have mixed types, like this: “5, Python, -1, Search, 12, Word”. Most programmers use the word “Item” to describe individual values in a list. Python lists are a datatype that also have multidimension functionality. We will focus here on one dimensional lists.

1
listTxtWords = ["Bake", "Fry", "Broil", "Boil"]

The above scripts simultaneously creates a list called “listTxtWords” and fills the list with four items. How do we get data from the above Python list?

1
2
txtWrd = listTxtWords[2]
print(txtWrd)

The value stored in “txtWrd” becomes “Broil” because the count start outs at zero for lists. So, unlike how the index start outs at “1” in a Cockroach Array, it start outs at “0” for Python lists. “2” here means the third item in the list. That number is often called “index”, which corresponds nicely to typical scriptsr understanding of programming concepts in other languages.

Let’s loop through the list to see all the values:

1
2
3
listTxtWords = ["Bake", "Fry", "Broil", "Boil"]
for txtWrd in listTxtWords
    print(txtWrd)

Analysis

  • for: We use the Python FOR loop to cycle through every value in “listTxtWords”, setting the value of a local variable called “txtWrd” to temporarily store the current item in that variable.
  • print: Python’s “print()” function provides output of the value in parens, in this case “txtWrd”.

The results from the above Python code:

1
2
3
4
Bake
Fry
Broil
Boil

Now that we get Python lists, let’s do a brief lesson on Python’s replace function. If you already get how to use Replace from Python’s built-in functions, feel free to skip down to creating a “stop word” list with Python.

REPLACE function

Syntax of Replace

1
txtSourceString_source.REPLACE(txtSourceString_to_replace, txtSourceString_replacement, [intCounter])

Parameters:

  • txtSourceString_source: The string we want to search through in order to find and replace a substring within that source text.
  • txtSourceString_to_replace: The string we are looking for that is somewhere within the source string.
  • txtSourceString_replacement: The string we want to use to replace the above “string to replace”.
  • intCounter: Optional: This number determines number of replacements to make, starting from the left. If left empty, default is “all”.

Example of Replace

Example of use of the Python Replace() method

1
2
3
4
5
txtSourceString_source = "Substitute coconut oil for canola oil."
txtSourceString_to_replace = "coconut"
txtSourceString_replacement = "olive"
txtSourceString_source.REPLACE(txtSourceString_to_replace, txtSourceString_replacement)
Print (txtSourceString_source)

The result here should be “Substitute olive oil for canola oil.”

Now let’s set up a Python list and filling it with “stop words”.

Fill List with stop words

First we will create a list containing our “stop words”:

1
listStopWords = ["a" , "about" , "above" , "after" , "again" , "against" , "all" , "am" , "an" , "and" , "any" , "are" , "as" , "at" , "be" , "because" , "been" , "before" , "being" , "below" , "between" , "both" , "but" , "by" , "could" , "did" , "do" , "does" , "doing" , "down" , "during" , "each" , "few" , "for" , "from" , "further" , "had" , "has" , "have" , "having" , "he" , "he'd" , "he'll" , "he's" , "her" , "here" , "here's" , "hers" , "herself" , "him" , "himself" , "his" , "how" , "how's" , "i" , "i'd" , "i'll" , "i'm" , "i've" , "if" , "in" , "into" , "is" , "it" , "it's" , "its" , "itself" , "let's" , "me" , "more" , "most" , "my" , "myself" , "nor" , "of" , "on" , "once" , "only" , "or" , "other" , "ought" , "our" , "ours" , "ourselves" , "out" , "over" , "own" , "same" , "she" , "she'd" , "she'll" , "she's" , "should" , "so" , "some" , "such" , "than" , "that" , "that's" , "the" , "their" , "theirs" , "them" , "themselves" , "then" , "there" , "there's" , "these" , "they" , "they'd" , "they'll" , "they're" , "they've" , "this" , "those" , "through" , "to" , "too" , "under" , "until" , "up" , "very" , "was" , "we" , "we'd" , "we will" , "we're" , "we've" , "were" , "what" , "what is" , "when" , "when's" , "where" , "where's" , "which" , "while" , "who" , "who's" , "whom" , "why" , "why's" , "with" , "would" , "you" , "you'd" , "you'll" , "you're" , "you've" , "your" , "yours" , "yourself" , "yourselves"]

Remove stop words from a string

1
2
3
4
def removeStopWords(txtSource):
    for t_what_to_replace in listStopWords:
        txtSource.replace(t_what_to_replace, "")
    txtSource.replace("  ", " ")

Analysis

  • def: Creating a function called “removeStopWords” we can call as often as needed from our main application.
  • for: Looping through every item (“t_what_to_replace”) in our list, “list_top_words”.
  • txtSource.replace: For each stop word, remove (replace) it from our primary search phrase (“txtSource”).
  • txtSource.replace(” “, ” “): At the end, remove any extra spaces. See, if you remove “a” from “You are a human”, then you are left with “you are human”, so that is why we remove extra spaces.

Create test table in CockroachDB to search against

To start out, we will build a test table in to use for saving user activity in a CockroachDB database.

1
2
3
4
5
6
7
8
CREATE TABLE public.tblStuff (
    id serial NOT NULL,
    idSession int4 NULL DEFAULT 0,
    txtContent VARCHAR(4096) NULL,
    dateWhen DATE NULL DEFAULT now(),
    CONSTRAINT tblStuff_pkey PRIMARY KEY (id)
);
CREATE UNIQUE INDEX tblStuff_id_idx ON public.tblStuff USING btree (id);

After creating the tblStuff Cockroach table, if the default value for the id column wasn’t set to “nextval(‘tblStuff_id_seq’::regclass)”, copy this value into the “id” field. We created that CockroachDB column to be an auto-increment indexed column so that every time we insert a record, the “id” column will increase by one and always be a unique index for “tblStuff”.

Next we will write the SQL needed to search for and retrieve data from the “tblStuff” table:

Create SQL for search

1
2
3
4
5
6
7
8
9
s = ""
s += "SELECT txtContent FROM tblStuff"
s += "WHERE"
s += "("
s += " txtContent LIKE (%txtInput)"
s += ")"
cursorCRDB.execute(s, [txtInput])
crdbRow = cursorCRDB.fetchone()
IDproduct = crdbRow(0)

Analysis: As you can see, we have used parameterized SQL to SELECT rows from the “tblStuff” CockroachDB table based on the value in our local “txtInput” variable.

Now that we get the primary parts required to build our “Word Search for Cockroach” application, we can examine the full source scripts listing:

Source Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
from flask import Flask
import psycopg2
from flask import render_template

app = Flask(__name__)
@app.route("/")

# database connection setup
txtDBname = "db name"
txtNameUser = "db user name"
txtSSLmode = "auto"
txtSSLrootCert = 'certifs/ca.crt'
t_sslkey = 'certifs/client.maxroach.key'
t_sslcert = 'certifs/client.maxroach.crt'
txtHostURL = "localhost"
txtPortNum = "48386"
connCRDB = psycopg2.connect(database=txtDBname, user=txtNameUser, sslmode=txtSSLmode, sslrootcert=txtSSLrootCert, sslkey=t_sslkey, sslcert=t_sslcert, host=txtHostURL, port=txtPortNum)
cursorCRDB = connCRDB.cursor()

def setUpMainVariables():
    txtInput = "Jump rope with gusto."
    # in your application, this variable would not be set up
    #    as a constant as you see here.

def removeStopWords(txtSource):
    # Iterate through stop words list.
    for txtItem in listStopWords:
        txtSource.replace(txtItem, "")
    # Replace all instances of two spaces with one space.
    txtSource.replace("  ", " ")

def searchDB(txtInput):
    # Notice this is a parameterized query
    s = ""
    s += "SELECT txtContent FROM tblStuff"
    s += "WHERE"
    s += "("
    s += " txtContent LIKE (%txtInput)"
    s += ")"
    try:
        cursorCRDB.execute(s, [txtInput])
        dbRow = cur.fetchall()
        # Next (not shown here): Show the data in dbRow to your user(s)
    except psycopg2.Error as e:
        txtMessage = "Cockroach Database error: " + e + "/n SQL: " + s
        return render_template("errors.html", txtMessage = txtMessage)
    cursorCRDB.close

# Main application
def main():
    setUpMainVariables()
    removeStopWords(txtInput)
    searchDB()

# Call main application
main()

if __name__ == "__main__":
    app.run(debug = True)

Conclusion

In this tutorial document your knowledge grew about how to build an application using word pattern search for Cockroach, primarily to refine user input by removing “stop words” before using CockroachDB’s “SELECT”, “WHERE”, and “LIKE” to get the data you need. In addition, we saw how to create and use Cockroach tables, Python functions, and how to use Python to detect and branch from “try” error checking. Finally, we shared all the Python source scripts for this project.

Pilot the ObjectRocket Platform Free!

Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.

Get Started

Keep in the know!

Subscribe to our emails and we’ll let you know what’s going on at ObjectRocket. We hate spam and make it easy to unsubscribe.