Use Psql to Insert a Record with Unicode Characters
Introduction to inserting PostgreSQL records with Unicode characters
If you try to insert a record into a PostgreSQL table using the psql command-line interface, and the record data has a Unicode character (outside of the 128 ASCII character range) then psql will return the following error:
This article will show you how to properly concatenate Unicode strings using SQL, while connected to a PostgreSQL database, in order to insert the Unicode data into a table’s string or text column.
Supported character sets in PostgreSQL
The following is a list of all of all of the supported character sets for PostgreSQL:
NOTE: Use the
SHOW client_encoding; SQL command, while connected with psql, to have Postgres return the default client encoding for the databases.
The default character set for a database is the UTF-8 (
UTF) Unicode set that encompasses the entire Unicode. If you’d like to create a database using another, more specify, character set, then make sure to use the
WITH ENCODING statement.
The following example SQL statement will create a database with Russian Cyrillic Unicode character support:
Set the default character encoding set for the PostgreSQL cluster
The following example shows you how can set the Korean Hangul character encoding as the default encoding set for the entire PostgreSQL cluster:
-D option is to specify the directory for the database system. If you don’t know where your Postgres data is stored you can use the
SHOW data_directory; command while connected with a Postgres
SUPERUSER role to have it return the directory location for you.
Prerequisites to inserting record data into a PostgreSQL table
Make sure that the Postgres service is installed and running. The example SQL Postgres statements found in this article rely on the
psql command-line interface.
Get the version numbers for Postgres and psql
You can use
psql -V and
postgres -V to get the version numbers for psql and Postgres respectively. While connected to PostgreSQL with psql you can use the
\conninfo to have Postgres return information about the server’s port (default is
5432) and database.
Create a user role for the Postgres database with Unicode records
psql postgres command to connect to PostgreSQL (using the
postgres user), and then create a user role for the database that will store the Unicode characters. The following example creates a new user with a
SUPERUSER role and an encrypted password:
Grant privileges to the user for all the tables in the public schema
This next command will grant all privileges to the user for all tables in the public schema:
Also grant the user privileges to modify the records in the target database:
Create a new PostgreSQL table for the Unicode record data
Let’s create a table for the Unicode string data. The concatenated string data will be inserted into the table’s
str VARCHAR() column. The following SQL statement creates a table with 4 columns:
id UUID PRIMARY KEY,
Get the four-digit UTF-8 bytes string for the Unicode character
You’ll need to convert the special Unicode character into a Unicode escaped string with the 4-digit control characters (e.g. the code
'\u4f60' for the character
The following Python code converts some UTF-8 string data into their respective Unicode escape sequences:
'Jag kan också köpa ost'.encode("unicode_escape")
'Grimms Märchen sind großartig'.encode("unicode_escape")
'The EU uses the €, and Britain uses the £'.encode("unicode_escape")
U+0000 (which represents the
null character) form of the Unicode control characters should be inserted into Postgres as
The above Python code should return the following respective Unicode escape sequences as bytes strings:
b'Jag kan ocks\\xe5 k\\xf6pa ost'
b'Grimms M\\xe4rchen sind gro\\xdfartig'
b'The EU uses the \\u20ac, and Britain uses the \\xa3'
Convert a raw byte into a UTF-8 Unicode code point
The UTF-8 encoding standard in psql will only accept the escaped, 4-digit Unicode control characters (
\uNNNN'), so if you only have the two-digit raw byte (
\xNN) you’ll have to convert it to the UTF-8 byte Unicode code point by replacing the
\x with into a UTF-8 escaped string with two leading-zeros (e.g.
The Swedish letter
"å", for example, can be represented with the two-digit raw bytes string
'\xe5', so its 8-byte UTF-8 form would be
Using psql to insert a record with unicode characters
Now that you have the proper 8-byte Unicode code point for your Unicode characters you can insert them into our table created earlier.
Concatenate a string using the CONCAT SQL function
We’re going to call the
CONCAT() function in SQL to concatenate a string, using the UTF-8 Unicode code points, while we insert the record data into the table.
The following SQL statement uses the
E'' escape sequence to create the Unicode string for
'你好世界', which means “Hello, World” in Mandarin Chinese:
Insert the concatenated Unicode string into one of the PostgreSQL table columns
CONCAT() function inside of the
VALUES keyword, while executing the
INSERT INTO statement, in order to insert the Unicode string data into the PostgreSQL table:
some_table (id, str, INT, bool)
CONCAT(E'\u4f60', E'\u597d', E'\uff0c', E'\u4e16', E'\u754c'),
CONCAT('Jag kan ocks', E'\u00E5', ' k', E'\u00f6', 'pa ost'),
CONCAT('Grimms M', E'\u00e4', 'rchen sind gro', E'\u00df', 'artig'),
CONCAT('The EU uses the ', E'\u20ac', ', and Britain uses the ', E'\u00a3'),
The above SQL statement should return the following result:
Conclusion to inserting records with Unicode characters into PostgreSQL
The psql interface can sometimes be a little tricky when trying to use Unicode. Many of the PostgreSQL client libraries have built-in Unicode support to help you avoid a lot of the hassle of concatenating UTF-8 code points into strings, but at least now you have a better idea of how Unicode, and the different encoding (character sets), work in the psql interface for PostgreSQL.
Pilot the ObjectRocket Platform Free!
Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.Get Started