How to Use a Mapping Char Filter in Elasticsearch
Introduction
Have you ever wanted to go through an Elasticsearch index and do a “find and replace” on your data? Maybe you wanted to replace all single quotes with double quotes, or maybe you wanted to check for instances of a profanity and remove them. Fortunately, there’s an easy way to accomplish this task: using a mapping char filter. In this tutorial we’ll show how to use a mapping char filter in Elasticsearch to find all instances of a string and replace them with a new string. If you’re already familiar with the concept of character filters and would prefer to dive right into the sample code, feel free to skip ahead to Just The Code.
Prerequisites
Before we can attempt to create a mapping char filter we need to make sure a few prerequisites are in place. For this task the system requirements are minimal
- Elasticsearch should be installed and running.
- Kibana should be installed and running.
To check if Elasticsearch is running, just execute the following command in the terminal:
1 | curl http://localhost:9200/_cluster/health?pretty |
You should receive output containing information about your instance of Elasticsearch. If you know that Elasticsearch is installed but you don’t receive the expected output, you may need to restart the process on your machine.
To check if Kibana is running, visit the server’s status page in a browser:
1 | localhost:5601/status |
You should see status information about your installation of Kibana.
Use a Mapping Char Filter in Elasticsearch
STEP ONE – Create a Mapping Char Filter
Let’s take a look at an example of how to create a mapping char filter. In our example, we want to find all instances of &
in our data and replace them with the word and
. A mapping char filter makes this task simple: You provide the string you’d like to find all instances of and the string you’d like to replace them with. The code below shows how to create a mapping char filter. It’s a bit complex, but the explanations that follow will clarify what’s going on:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | PUT demo_index { "settings": { "analysis": { "analyzer": { "ampersand_analyzer": { "tokenizer": "keyword", "char_filter": [ "ampersand_char_filter" ] } }, "char_filter": { "ampersand_char_filter": { "type": "mapping", "mappings": [ "& => and" ] } } } } } |
Let’s take a closer look at what’s happening in the code. First, we defined our character filter with "char_filter"
. We named it "ampersand_char_filter"
and we set its type to "mapping"
. Next, we defined the mapping to replace "&"
with "and"
. If we had any other character replacements we wanted to make, we’d add them to this mapping array:
1 2 3 4 5 6 7 8 | "char_filter": { "ampersand_char_filter": { "type": "mapping", "mappings": [ "& => and" ] } } |
If you’re planning to add a large number of mappings to your mapping char filter, you can opt to us the "mappings_path"
option, which allows you to provide a UTF-8 encoded text file that contains all of the mappings. The text file would have one mapping per line.
Note: Keep in mind that character filtering occurs before the tokenizer evaluates your string. If your tokenizer is not breaking up text in the way that you expect, it may be because of the action of your character filter.
STEP TWO – Verify your Mapping Char Filter works in Kibana
Once your mapping char filter is set up, you can verify that it works by using the Analyze API. This API runs the data through the character filters as well as the tokenizers that are defined in the analyzer. It then returns results.
Here’s the Kibana command to test out your character filter:
1 2 3 4 5 | POST demo_index/_analyze { "analyzer": "ampersand_analyzer", "text": "I love databases & data." } |
The results from that command would look like this:
1 2 3 4 5 6 7 8 9 10 11 | { "tokens" : [ { "token" : "I love databases and data.", "start_offset" : 0, "end_offset" : 24, "type" : "word", "position" : 0 } ] } |
As you can see, our character filter worked correctly: the "&"
has been replaced with "and"
as we expected.
Conclusion
When you need to make certain replacements throughout your dataset, you want to get the job done quickly and efficiently. Fortunately, Elasticsearch makes this task easy with the use of mapping char filters. The mapping char filter offers a “find and replace” functionality that allows you to clean up or standardize your data with minimal effort. With the step-by-step instructions offered in this tutorial, you’ll have no trouble transforming your data by using a mapping char filter in Elasticsearch.
Just the Code
If you’re already familiar with the concept of character filters, here’s all the code you need to use a mapping char filter in Elasticsearch:
Create a mapping char filter:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | PUT demo_index { "settings": { "analysis": { "analyzer": { "ampersand_analyzer": { "tokenizer": "keyword", "char_filter": [ "ampersand_char_filter" ] } }, "char_filter": { "ampersand_char_filter": { "type": "mapping", "mappings": [ "& => and" ] } } } } } |
Verify your mapping char filter works in Kibana:
1 2 3 4 5 | POST demo_index/_analyze { "analyzer": "ampersand_analyzer", "text": "I love databases & data." } |
Pilot the ObjectRocket Platform Free!
Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.
Get Started