Memory Management with the Psycopg2 Adapter for PostgreSQL (Part 2)
Memory management in the psycopg2 Python adapter for PostgreSQL
A running Python script using the psycopg2 adapter for PostgreSQL can cause some memory issues if you fail to take certain precautions or setup your app correctly. The following article will provide some tips to help you better manage your psycopg2 application’s memory and system resources. This is part two of the memory management in psycopg2 article series.
NOTE: PostgreSQL also has a memory management library called SPI you can connect to to assist in streamlining your psycopg2 Python application’s efficiency.
Prerequisites to using the psycopg2 adapter for PostgreSQL
The code in this article has only been tested on Python 3, and Python 2.7 is now deprecated and losing support, so it’s recommended that you have a version of Python greater than
3.4
installed, and that you run your scripts usingpython3
. Use thepython3 -V
command in a terminal window to verify that it’s installed.You’ll need the PIP3 package manager for Python 3 working on your machine or server in order to install the
psycopg2
library using thepip3
command. Install the adapter using the following command in a UNIX terminal or Windows command prompt:pip3 install psycopg2
.The PostgreSQL server should be installed and running, and you should have some database and table record data you can test the code on.
Install the psutil for library for Python 3
Some installations of Python may not come with psutil
pre-installed. You can use the pip3 list
command to check what packages are already installed, or use the following command, in a UNIX terminal, to install it:
1 | sudo python3 -m pip install psutil |
Or you can also install it with just the pip3 install
commands, like in the following example:
1 | pip3 install psutil |
Create a Python script and import libraries to connect to PostgreSQL
The following Python code imports the psycopg2
adapter’s sql
and connect
libraries, as well as the psutil
, sys
, and gc
libraries to help manage your system’s memory:
1 2 3 4 5 6 7 8 | # import the psycopg2 database adapter for PostgreSQL from psycopg2 import connect, sql # import Python's 'sys' and 'psutil' libraries import sys, psutil # import Python's garbage collection library import gc |
Declare a string for the PostgreSQL table name and SQL statement
Declare a string for the PostgreSQL table name so that it can be inserted into a SQL statement string concatenation later:
1 2 | # declare a string object for the table name table_name = "some_table" |
The following code uses the LIMIT
SQL keyword to restrict the number of records returned. This will help to reduce the application’s memory use in the case fetching records from extremely large PostgreSQL tables:
1 2 3 4 5 | # concatenate a SQL string from table name and columns sql_statement = "SELECT * FROM {} LIMIT 1000;".format( table_name ) # print the concatenated SQL string print ("sql_statement:", sql_statement) |
Using Python’s ‘with’ statement to manage memory and connect to PostgreSQL
Python’s with
statement allows you to execute a block of code, inside of an indentation, in order to close all of the objects within it, and free up those resources, after the code has been executed.
Connect to PostgreSQL inside of a ‘with’ indentation
The code in this section will nest a bunch of code to connect to PostgreSQL and execute transactions inside of a with
indentation block. Here’s some example code to connect to a PostgreSQL server using the as
Python keyword to assign an alias (conn
) for the connection:
1 2 3 4 5 6 7 8 | # use Python's 'with' statement to release the memory afterwards try: with connect ( dbname = "some_database", user = "objectrocket", host = "localhost", password = "mypass" ) as conn: |
NOTE: Make sure to pass the correct parameters or else PostgreSQL will raise an error.
Use the ‘with’ and ‘as’ Python keywords while using PostgreSQL cursor
The next step is to nest another with
indentation code block in order to create a cursor object, and execute the SQL string declared earlier, inside of the connection indentation:
1 2 3 4 5 6 7 8 9 | with conn.cursor() as cursor: # have sql.SQL() return a sql.SQL object sql_object = sql.SQL( # pass SQL string to sql.SQL() sql_statement ).format( # pass the identifier to the Identifier() method sql.Identifier( table_name ) ) |
NOTE: It’s recommended that you use the sql.SQL()
and sql.Identifier()
methods, instead of just passing a concatenated string to the execute()
method, to help prevent SQL injection attacks.
Setting the cursor’s name attribute to
Psycopg2 will load all of the query into memory if a name isn’t specified for the cursor object. Here’s an example of how you can do that to reduce how much memory the application uses:
1 | with conn.cursor(name='memory_test') as cursor: |
Lower the cursor object’s itersize integer value
You can also reduce the cursor’s itersize
attribute (default value is 1000
) to help save on resources by reducing the batch size of each fetch iteration:
1 2 | # reduce the itersize attribute value (default is 1000) cursor.itersize = 100 |
Call the psutil function while executing a PostgreSQL statement
Execute the sql.SQL
object by passing it to psycopg’s execute()
method. Now you can call the get_avail_memory()
Python function (explained and constructed in part one of this series) inside of the nested with
indentation block to get an idea of how much memory the PostgreSQL app is using:
1 2 3 4 5 6 7 8 9 10 11 12 | # execute the sql.SQL() object cursor.execute ( sql_object ) # use the fetchall() method to return a list of all the data table_data = cursor.fetchall() print ("table_data:", len(table_data)) # call the psutil function to check available memory print (get_avail_memory()) except Exception as err: print ("\npsycopg2 error:", err) |
Conclusion
Save the code and then execute the script using the python3
command followed by the script name. You should see a response in your terminal or command prompt window that looks something like the following:
1 2 3 4 5 6 7 8 | sql_statement: SELECT * FROM some_table LIMIT 1000; table_data: 1000 available memory: 43.05% memoryview cursor: <memory at 0x7f7153692a08> available memory: 43.04% connection.closed: 0 cursor.closed: True memoryview() cursor error: cursor already closed |
At this point you should have an arsenal of tip and methods you can use to better manage your system’s memory and resources while running a psycopg2 PostgreSQL application. Check out part 1 of this article series for a more general overview of memory management in Python.
Just the Code
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 | #!/usr/bin/python3 # -*- coding: utf-8 -*- # import the psycopg2 database adapter for PostgreSQL from psycopg2 import connect, sql # import Python's 'sys' and 'psutil' libraries import sys, psutil # import Python's garbage collection library import gc # declare a string object for the table name table_name = "some_table" # concatenate a SQL string from table name and columns sql_statement = "SELECT * FROM {} LIMIT 1000;".format( table_name ) # print the concatenated SQL string print ("sql_statement:", sql_statement) # a function that returns a string of the system's available memory def get_avail_memory(): # call the psutil virtual_memory() method mem = psutil.virtual_memory() # calculate the available memory avail_mem = ((mem.total - mem.used) / mem.total)*100 # concatenate a string for the memory avail_mem = "available memory: {0:.2f}%".format( avail_mem ) return avail_mem # use Python's 'with' statement to release the memory after complete try: with connect ( dbname = "some_database", user = "objectrocket", host = "localhost", password = "mypass" ) as conn: with conn.cursor(name='memory_test') as cursor: # have sql.SQL() return a sql.SQL object sql_object = sql.SQL( # pass SQL string to sql.SQL() sql_statement ).format( # pass the identifier to the Identifier() method sql.Identifier( table_name ) ) # reduce the itersize attribute value (default is 1000) cursor.itersize = 100 # execute the sql.SQL() object cursor.execute ( sql_object ) # use the fetchall() method to return a list of all the data table_data = cursor.fetchall() print ("table_data:", len(table_data)) # call the psutil function to check available memory print (get_avail_memory()) # have memoryview return the memory address for the cursor cursor_bytes = bytes(cursor) print ("memoryview cursor:", memoryview(cursor_bytes)) except Exception as err: print ("\npsycopg2 error:", err) # explicitly invoke Python's garbage collection gc.collect() # return the available memory using the psutil function print (get_avail_memory()) # check if the PostgreSQL connection and cursor objects are closed print ("connection.closed:", conn.closed) # 0 == open print ("cursor.closed:", cursor.closed) # False == open try: # should return error stating: "cursor error: cursor already closed" cursor_bytes = bytes(cursor) print ("memoryview cursor:", memoryview(cursor_bytes)) except Exception as err: print ("memoryview() cursor error:", err) # connect to PostgreSQL again print ("\nconnecting to PostgreSQL") conn = connect ( dbname = "some_database", user = "objectrocket", host = "localhost", password = "mypass" ) # instantiate a cursor object from the connection cursor = conn.cursor() # check if the cursor is closed if cursor.closed == False: print ("closing cursor") cursor.close() # check if the connection is closed if conn.closed == 0: print ("closing connection") conn.close() |
Pilot the ObjectRocket Platform Free!
Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.
Get Started