How To Use Regexp and Wildcard Queries To Return Documents With a Partial String Match
Introduction
When people search for data, they’re not always looking for a single exact match. A user might want to return all the different types of bread they sell in their store, so they may search for all products in their inventory that have the word “Bread” in the name. Another user may be searching for a last name that they know begins with “Sto”, though they might not be sure how the rest of the name is spelled. Regardless of the situation, regular expressions, also known as “regexps”, and wildcard queries can be used on Elasticsearch fields of type keyword
and text
to allow for partial matching. For fields of type date
and integer
, you can also broaden your searches with the use of range queries. In this tutorial, we’ll provide step-by-step instructions on how to use regex and wildcard queries to return documents that only have a partial match.
NOTE: Since the rollout of version 6.0, Elasticsearch has begun enforcing a strict content-type checking for cURL requests. What this means is that cURL requests must now include -H 'Content-Type: application/json'
as a header option whenever the request has a JSON object in its content body. The header option explicitly specifies that the content type is in JSON format. If this header option is omitted, you’ll get a 406 Content-Type
header error:
You can use the command curl --help
for more information about the various options.
Wildcard queries
Wildcard queries allow you to specify a pattern to match instead an exact term. In a wildcard query, ?
matches any charcter and *
matches zero or more characters. To create a wildcard query for Elasticsearch, you need to specify the wildcard
option in your query, followed by a string pattern and asterisks (*
), all contained within double quotes. We’ll look at an example of such a query in the following code:
Architecture of a wildcard
query:
1 2 3 4 5 6 7 | { "query": { "wildcard" : { "{TEXT_OR_KEYWORD_FIELD}" : "*{STRING_PATTERN}*" } } } |
Wildcard Query in the Kibana UI console:
1 2 3 4 5 6 7 8 | GET people1/_search { "query": { "wildcard" : { "name" : "to*" } } } |
In this query, you can see that we’re searching for any values of the "name"
field that begin with “to”. Notice that the the request’s filtering is NOT case sensitive to the pattern:
You can put the asterisks in front of the string pattern if you want to match characters at the end of the word:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | GET people1/_search { "query": { "wildcard": { "accounts": "*cana" } } } # RETURNS ---> "hits" : [ { "_index" : "people1", "_type" : "peeps", "_id" : "RMmU02kBXluIHJG2P24l", "_score" : 1.0, "_source" : { "name" : "Aurelius Americana", "age" : "48", "sex" : "male", "accounts" : "aurelius_americana", "join_date" : "2009-03-07" } } ] |
You can see in this example that our wildcard query matches any value of the "name"
field that ends in “cana”. In this case, it returned the match “Aurelius Americana”.
Wildcard Query as a cURL request:
You can easily perform wildcard queries from your terminal console via a POST
cURL request. In this example, we’ll tell Elasticsearch that we want to search the index people1
by passing the name of the index in our HTTP header:
1 2 3 4 5 6 7 8 | curl -XPOST "localhost:9200/people1/_search?pretty" -H 'Content-Type: application/json' -d ' { "query": { "wildcard" : { "name" : "to*" } } }' |
Elasticsearch queries using regexp
Another method for broadening your searches to include partial matches is to use a "regexp"
query, which functions in a similar manner to "wildcard"
. There are a number of symbols and operators used in regular expression syntax to denote wildcards and ranges of characters:
A period
"."
is used to stand in for any character.A range of characters enclosed in brackets, such as
[a-z]
, is a character class. A character class represents a range of characters; in this example, it acts as a stand-in for any alphabetic letter.The plus sign
"+"
is used to indicate characters that repeat; for example, the “pp” in “Mississippi”.
Let’s look at a "regexp"
that includes all of the regular expression syntax we just discussed. The regexp shown in the following example will match the word "Mississippi"
:
1 2 3 4 5 6 7 8 9 10 11 | GET states/_search { "query": { "regexp": { "name": "[a-z]*ip+i" } } } # RETURNS ---> "name" : "Mississippi" |
More regexp
examples in Kibana Console
Though our previous example returned only one result, "Mississippi"
, it’s possible to return many matches depending on how your regexp query is constructed. Let’s create some broader searches that will return more than one result:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | GET states/_search { "query": { "regexp": { "name": "mis+[a-z]*" } } } # RETURNS ---> "name" : "Missouri" # ..and "name" : "Mississippi" GET states/_search { "query": { "regexp": { "name": "[a-z]*ska" } } } # RETURNS --> "name" : "Alaska" # and.. "name" : "Nebraska" |
Kibana Console UI Example of regexp
You can see in this example that it’s easy to perform wildcard and regexp queries from the Kibana Console UI. Note how the regular expression used in the query matches multiple results.
A regexp
query using a POST
cURL request:
Like "wildcard"
queries, "regexp"
queries are not case-sensitive. The following query will return values containing both “Th” and “th”:
1 2 3 4 5 6 7 8 | curl -XPOST "localhost:9200/people1/_search?pretty" -H 'Content-Type: application/json' -d ' { "query": { "regexp" : { "name" : "Th[a-z]*" } } }' |
Conclusion
If you want to ensure that your users find the information they’re looking for, you need to know how to create queries that return partial matches. Fortunately, Elasticsearch makes it easy to formulate partial-match queries using wildcards and regular expressions. With the step-by-step instructions included in this tutorial, you’ll be able to use regex and wildcard queries to return documents without requiring an exact string match.
Pilot the ObjectRocket Platform Free!
Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.
Get Started