How to Get More Precise Document Matches When Querying Elasticsearch

Have a Database Problem? Speak with an Expert for Free
Get Started >>

Introduction

When you’re interacting with Elasticsearch, even a fairly selective query can perform too broad of a search, returning results you don’t need. Fortunately, there are several tools, queries and options available in Elasticsearch that allow you to narrow your search even further and return more precise document matches.

NOTE: Since the rollout of version 6.0, Elasticsearch has begun enforcing a strict content-type checking for cURL requests. What this means is that cURL requests must now include -H 'Content-Type: application/json' as a header option whenever the request has a JSON object in its content body. The header option explicitly specifies that the content type is in JSON format. If this header option is omitted, you’ll get a 406 Content-Type header error: 406 Content Type Header Error Missing Header Option You can use the command curl --help for more information about the various options.

Typical construction of a match query request

The "match" query is a versatile one, performing a full-text search that makes it a “go-to” query for many use cases. In the following example, the structure of this "match" query request will return multiple results:

1
2
3
4
5
6
7
8
9
10
11
curl -X GET "localhost:9200/people1/_search?pretty" -H 'Content-Type: application/json' -d '
{
"query": {
"bool": {
"must": [
{ "match": { "name": "Oct Locke" } }
]
}
}
}
'

Kibana Console "must" Match Query returns 6 hits

Even this relatively simple _search request still returns more than one hit:

1
curl -XGET "localhost:9200/people1/peeps/_search?q=Oct?pretty"

JSON Response:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 4,
"max_score" : 5.4086657,
"hits" : [
{
[...]
}
}
}

Get the _mapping of an Elasticsearch Index

If you want to retrieve the mapping definition for an index, you can use a GET cURL request to have it return the index’s _mapping as a JSON object:

1
curl -XGET "localhost:9200/people1/_mapping/peeps?pretty"

You can then use the mapping layout of the index to see all the field names in the index and their associated data types. When you have a better understanding of an index’s mapping, you can more precisely query data within it:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
{
"people1" : {
"mappings" : {
"peeps" : {
"properties" : {
"accounts" : {
"type" : "text"
},
"age" : {
"type" : "integer"
},
"join_date" : {
"type" : "date"
},
"name" : {
"type" : "text"
},
"sex" : {
"type" : "text"
}
}
}
}
}
}

Term Query

The "term" and "terms" queries don’t perform an analysis on data, unlike "match" queries, which perform a full-text search on analyzed text. Therefore, they only return exact matches. A "term" query option in a search can be useful when it’s important to find the exact document specified in a search. In this example, we’re looking for a specific id:

1
2
3
4
5
6
7
8
GET /animals/_search
{
"query": {
"term": {
"_id": "Dslr02kBXluIHJG2BGRZ"
}
}
}

The following code shows the _mapping layout of an index called "example":

1
2
3
4
5
6
7
8
9
10
11
12
13
{
"example" : {
"mappings" : {
"docs" : {
"properties" : {
"content" : {
"type" : "text"
}
}
}
}
}
}

The next example, which is run from the Kibana Console UI, shows how "term" and "terms" are used. In a "term" query, you search for just one piece of string text; however, a "terms" query allows you to pass an entire array of values, returning all documents that match at least one of the parameters in the array for the field being searched:

1
2
3
4
GET example/_search
{
"query": {"term": {"content": "test"}}
}

Using "term" and "terms" _search queries in the Kibana Console UI

Using "term" and "terms" queries can be helpful when you want a structured search where only an exact match is returned– for example, a search for all books in an index with a specific publisher.

Making a GET request with match_phrase

When your query contains multiple conditions, you can add a useful option called minimum_should_match. This allows you to pass an integer value defining how many matches a document needs to have to qualify as a “hit” and be returned:

1
2
3
4
5
6
7
8
9
10
11
12
13
curl -XGET "localhost:9200/people1/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query": {
"bool" : {
"should" : [
{"match_phrase" : {"name" : "oct" }},
{"match_phrase" : {"name" : "locke" }}
],
"minimum_should_match" : 2
}
}
}
'

In the example above, there are two keywords specified as search conditions, and setting minimum_should_match to 2 dictates that both keywords must match in order for a document to be returned:

Returns 3 Hits of Oct Locke

Use min_score to limit the hits

When you issue a query, every document in the index being searched is given a score based on the number of field matches from the query and any other configurations that may be applied to the search. You can use min_score as a way to filter out less-relevant “hits” and only return documents that more accurately reflect the query.

Kibana Console UI GET Example:

1
2
3
4
5
6
7
8
9
GET example/_search
{
"min_score": 0.5,
"query": {
"query_string": {
"query": "test"
}
}
}

min_score filter for Elasticsearch queries

Conclusion

There are times when you don’t want your search to be broad and expansive– instead, you want a more selective search that returns precise and accurate results. In this tutorial, we’ve shown you many tools and query options that can help you narrow down your searches and return the exact information you need. With the instructions and explanations we provided above, you’ll be ready to create powerful, targeted queries that return more precise document matches.

Pilot the ObjectRocket Platform Free!

Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.

Get Started

Keep in the know!

Subscribe to our emails and we’ll let you know what’s going on at ObjectRocket. We hate spam and make it easy to unsubscribe.