Python Pagination of Postgres

Introduction

This article is a tutorial on Python Pagination of Postgres data. For many reasons, you may want to limit return of recordsets to small chunks, as opposed to returning large recordsets that can be expensive both in terms of bandwidth, RAM, and time on the server, as well as economy for the user. Our methodology:

  • What and why? What is Python pagination in PostgreSQL? Why would we want to paginate our Postgres data with Python and in what circumstances?
  • How? How do we paginate PostgreSQL results with Python? What are the quickest, easiest ways to do this for smaller data sets and what are the more robust and efficient ways for large databases and/or situations where server or client resources are limited and accuracy is paramount?

Prerequisites

  • SQL: Understanding the basics of writing SQL for Postgres. We use dBeaver because of its ease and number of features.
  • SELECT: We make liberal use of the “SELECT” command in this article to query a Postgres database with a Python application.
  • Tutorial on naming conventions showing why you may want to prefix your variables, column names, table names, etc. as you see done in this article. For example, naming “tvariable” with the “t” prefix to define it as a “text” (string) object and “tbl_” before table names in order to clearly distinguish those objects as tables. The tutorial goes a bit deeper, as well, talking about how to name variables based on a hierarchy system.

Create a Postgres table to paginate

Let’s say we are creating a forum or content management system (CMS) in Python and Postgres and we want a table for storing forum posts.

CREATE TABLE public.page_contents (
    id serial NOT NULL,
    id_session int4 NULL DEFAULT 0,
    id_author int4 NULL DEFAULT 0,
    t_title VARCHAR(256) NULL,
    t_contents text NULL,
    d_created DATE NULL DEFAULT now(),
    CONSTRAINT page_contents_pkey PRIMARY KEY (id)
);
CREATE UNIQUE INDEX page_contents_id_idx ON public.page_contents USING btree (id);

Now that we have a PostgreSQL table, we’ll fill it with data, so we have something to test with:

INSERT INTO
    page_contents
    (
    id_author
    , t_title
    )
SELECT
    RANDOM()*100::INTEGER + 1 AS id_author
    , array_to_string(array(SELECT substr('ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789',((random()*(36-1)+1)::INTEGER),1) FROM generate_series(1,32)),'') AS t_title;
FROM
    generate_Series(1,10000);

Overview: We’re inserting 10000 records into our “page_contents” table. For each record, we’re filling the “id_author” field with a random number between 1 and 100, inclusive of those two numbers. We’re also filling the “t_title” field with a random string of 32 characters from the alphabet.

Note: If you need a larger dataset, merely increase the “10000” you see above to a larger number.

Analysis

  • INSERT: Used PostgreSQL’s “random” function to generate a random number between 0 and 99, cast it as an integer (“::integer”), and then add “1”, so that the result is between (inclusive) 1 and 100 and is of sufficient data type to place into the “id_author” column.
  • array_to_string: We’re randomly selecting letters 32 times (“generate_series(1,32)”) from a 36 character long string from A to Z and 0 to 9 in order to build a string that is inserted into the “t_title” field.
  • generate_Series(1,10000): This tells the Postgres SQL engine to generate 10,000 rows in the table.

Now that we have a large dataset to work with, let’s look at two distinct methods for retrieving data from this table using Python.

Python pagination of Postgres quick and dirty method

The least efficient but most oft-used is usually an object or library that – underneath – uses “limit-offset” pagination. For situations where your database is small or medium in size, this should work fine, with few drawbacks, if any. Later, when we utilize a more robust method, we’ll talk about the differences.

First, the benefits of any method of pagination:

  • The first page of your data will load faster. For user perception of how fast your app is, this is invaluable!
  • Faster views of large datasets.
  • Potential for business logic encapsulation (depends on how you do it).
  • The client is receiving less data at a time, which can be super useful, if not a requirement in certain environments.

Next, the methods we’ll use include total, limit, offset, and skip.

Now let’s look at some code:

page_current = 3
records_per_page = 10
offset = (page_current - 1) * records_per_page

s = ""
s += " SELECT *"
s += " FROM page_contents"
s += " ORDER BY id"
s += " LIMIT " + records_per_page
s += " OFFSET " + offset

# resulting "LIMIT" part of query: "LIMIT 10"
# resulting "OFFSET" part of query: "OFFSET 20"

Analysis

  • page_current: For testing purposes, we set up our current page to be 3.
  • records_per_page: We want to return only 10 records per page.
  • offset: This is the parameter that tells Postgres how far to “jump” in the table. Essentially, “Skip this many records.”
  • s: Creates a query string to send to PostgreSQL for execution.

The biggest problem with the above “limit offset” method of pagination is that “under the covers” Postgres is adding up the total amount of records, counting pages based on that number, and scanning the index to count how many rows. With first few pages, this cost may not be evident. But with larger recordsets and even medium-sized ones, the higher you increase “page_current”, the slower things will get. Fortunately, there are many ways we can do pagination of Postgres data.

A better method: seek and keyset pagination

A solution that alleviates some of the cons associated with using the above method is to pass in the unique ID of the last record in the previous page.

ID_page_previous = 200
page_current = 3
records_per_page = 10
# limit_optional_offset no longer needed!
limit_optional_offset = (page_current - 1) * records_per_page

s = ""
s += " SELECT *"
s += " FROM page_contents"
s += " WHERE id > " & id_page_previous
s += " ORDER BY id"
s += " LIMIT " + records_per_page
# s += " OFFSET " + offset

# resulting "LIMIT" part of query: "LIMIT 10"

Notice, with this method, we no longer need the OFFSET command. By adding the “WHERE” clause to tell PostgreSQL to leave out all previous records, so it knows where to begin showing records, we keep using LIMIT to command how many records to return.

While there are surely more efficient methods to paginate a Postgres table, the above method reaps massive benefits in speed and efficiency. A very important considertion is: “id” must be indexed.

Conclusion

In this tutorial we learned two methods for using Python for pagination of Postgres data. We began by creating a PostgreSQL table, then added 10,000 records to the table, so we had something to base test on, and then we compared two methods of pagination in Python, and finally, we analyzed the pros and cons of both methods. During this process, we made use of PostgreSQL’s “random” function, Python string concatenation for building SQL to send to Postgres, and variable creation.

Pilot the ObjectRocket Platform Free!

Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.

Get Started

Keep in the know!

Subscribe to our emails and we’ll let you know what’s going on at ObjectRocket. We hate spam and make it easy to unsubscribe.