Machine Learning with Python and Postgres

Introduction

In this tutorial, we will build a small application to see how to do machine learning with Python and Postgres. The application we build will use a linear regression machine learning methodology. We’ll expose the learning model to some numeric inputs, along with expected outputs. The model will learn the relationship between inputs and results and recognize a pattern.

Why learn to use Postgres and Python for machine learning?

Applications of machine learning

  • Many uses. As times goes on, machine learning increases in popularity, use cases, and ease of setup. Some of the many ways we can use machine learning include but are not limited to image recognition, predict new values based on old value patterns, group similar objects, predict customer needs, and detection of spam.

  • Employment viability. Having the knowledge and skills to add “machine learning” to your resume will surely increase your job prospects and salary.

Prerequisites

pip install scikit-learn
  • Use PIP to install Flask, psycopg2, random, and the scikit-learn libraries. get the scikit-learn library and research it here.

  • We’ll be using simple SQL commands like “SELECT”, “INSERT”, and “WHERE” in order to pull data from a database table and assuming you understand how these commands work with PostgreSQL.

Machine Learning Example 1

The simple application we will build that uses machine learning will help us decide whether the outdoor temperature requires that we wear a sweater. First, let’s look at some historical temperature data:

temp_celsiusb_wear_sweater
35False
30False
25False
20False
15True
10True

Most human brains will intuit the relationship here, making the connection between input and output; temp_celsius and b_wear_sweater, respectively. So, for example, at 14 degrees, you will choose to wear a sweater (and probably a coat, too) even though you were not specifically told that it would be a really good idea to.

A machine learning model trained with the above data would guess the same as you would.

Now let’s move on to a slightly more difficult example.

Machine Learning Example 2

input_01input_02input_03output
12314
34526
45632
11121374
212223134

After some analysis, you may figure out that the formula to calculate output from the above inputs is: output = input_01 + input_02 2 + input_03 3.

Now since we may want to feed the linear regression model enough data so it can learn, we may want to write a bit of code to generate a data sample much larger than what we see above.

Generate machine training data with Python

from random import randint
training_data_limit = 900
training_data_count = 200

training_input = list()
training_output = list()

for i_counter in range(training_data_count):
    a = randint(0, training_data_limit)
    b = randint(0, training_data_limit)
    c = randint(0, training_data_limit)
    i_output = a + (b*2) + (c*3)
    training_input.append([a, b, c])
    training_output.append(i_output)

Analysis

  • from random import randint. This is the library required for using the randint function to generate our random numbers.
  • training_data_limit = 900. This is the high end of all random numbers we will generate. This variable is fed to our randint function later.
  • training_data_count = 200. This is how many “rows” we will generate; how large the “range” of our input and output lists will be.
  • training_input = list(). This is to initialize our input list. A list in Python is an array.
  • training_output = list(). This initializes our output list.
  • for i_counter in range(training_data_count):. This line sets up our for loop to go from 1 to 200 since 200 is the value we stored in the training_data_count variable.
  • a/b/c = randint(0, training_data_limit). These three lines each fill a, b, and c with random numbers between 0 and 900.
  • i_output = a + (b2) + (c3). This uses the formula we set up in our first example in order to create a “solution” stored in i_output.
  • training_input.append([a, b, c]). Our three random numbers are added (append) as new “rows” in our array (list), which is a three dimensional array named training_input.
  • training_output.append(i_output). Our “solution” is added as a new “row” to the one dimensional training_output list.

Now that we have created two hundred sets of “problems” and “solutions” and stored these numbers in lists, that should be enough data to train our linear regression model with. Now we’ll learn how to do the training. Note: In our final application, instead of creating random data, we’ll pull the data from a PostgreSQL table.

from sklearn.linear_model import LinearRegression

predict_orama = LinearRegression(n_jobs=-1)
predict_orama.fit(X=training_input, y=training_output)

test_case = [[3, 5, 7]]
outcome = predict_orama.predict(X=test_case)

print('Solution : ' + outcome)
Solution: [34.]

Analysis

After giving the machine learning model enough training data, it used weighting to successfully calculate the test case we gave it.

Now – to emulate more of a real world use case, we’ll get our training data from Postgres. So we’ll write the SQL necessary to retrieve that data.

Get machine training data from Postgres

import psycopg2

s = ""
s += "SELECT"
s += " input_01"
s += ", input_02"
s += ", input_03"
s += ", output"
s += " FROM tbl_training_data"
db_cursor.execute(s)
try:
    for each db_row in db_cursor:
        training_input.append([db_row.fetch("input_01"), db_row.fetch("input_02"), db_row.fetch("input_03")])
        training_output.append(db_row.fetch("output"))
except psycopg2.Error as e:
    t_message = "Postgres Database error: " + e + "/n SQL: " + s
    return render_template("error.html", t_message = t_message)
db_cursor.close()

Analysis

The query above gets the first three numbers used as the “problem” part of our equation, along with the fourth number, “output”, which is the “solution”. We used the psycopg2 function “fetch” to pull all the data into our two lists.

Full source code

from sklearn.linear_model import LinearRegression
from flask import Flask
from flask import render_template
import psycopg2

app = Flask(__name__)
@app.route("/")

# --------------
# database creds
# --------------
t_host = "PostgreSQL database host address" # either "localhost", a domain name, or an IP address.
t_port = "5432" # default postgres port
t_dbname = "database name"
t_user = "database user name"
t_pw = "password"
db_conn = psycopg2.connect(host=t_host, port=t_port, dbname=t_dbname, user=t_user, password=t_pw)
db_cursor = db_conn.cursor()

@app.route("/main")

# ----------------
# primary function
# ----------------
def main():
    s = ""
    s += "SELECT"
    s += " input_01"
    s += ", input_02"
    s += ", input_03"
    s += ", output"
    s += " FROM tbl_training_data"
    db_cursor.execute(s)
    try:
        for each db_row in db_cursor:
            training_input.append([db_row.fetch("input_01"), db_row.fetch("input_02"), db_row.fetch("input_03")])
            training_output.append(db_row.fetch("output"))
    except psycopg2.Error as e:
        t_message = "Postgres Database error: " + e + "/n SQL: " + s
        return render_template("error.html", t_message = t_message)
    db_cursor.close()

    # -------------------------
    # Feed the ai training data
    # -------------------------
    predict_orama = LinearRegression(n_jobs=-1)
    predict_orama.fit(X=training_input, y=training_output)

    # ---------------------------------
    # Feed the ai a test case "problem"
    # ---------------------------------
    test_case = [[3, 5, 7]]
    outcome = predict_orama.predict(X=test_case)

    # ------------------------
    # Output the ai's solution
    # ------------------------
    print('Solution : ' + outcome)

# This is for command line testing
if __name__ == "__main__":
    app.run(debug=True)

Conclusion

In this tutorial, we built a simple application to see how to set up machine learning with Python and Postgres. The application we built uses the linear regression machine learning methodology. We trained the model using numeric inputs, along with expected outputs. The model used this data to learn the relationship between inputs and results and recognized a pattern, which is the underlying formula.

Pilot the ObjectRocket Platform Free!

Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.

Get Started

Keep in the know!

Subscribe to our emails and we’ll let you know what’s going on at ObjectRocket. We hate spam and make it easy to unsubscribe.