Guide On How To Use The Grok Filter Plugin Logstash Pattern
Introduction
Plugins and filters are extremely useful for clearly defining data guidelines. Monitoring and retrieval can be finely tuned to only pay attention to certain datasets. Grok is used for matching specific parameters to a given unstructured bulk of data to extract only what is wanted. It enables Logstash to establish its criteria for monitoring and analysis with a simple setup that is laid out in the tutorial that follows.
Grok is used to match lines with regular expressions then mapping parts of the text into sections and act based on the mapping.
The Grok Filter Plugin is useful to parse event logs and divide messages to multiple fields. Instead of creating regular expressions, users will utilize predefined patterns for parsing logs.
Grok offers a way to parse unstructured log data into a format can be queried.
Grok combines text patterns to form something to match system logs. A pattern in grok has this format %{SYNTAX:SEMANTIC}
.
This allows the use of advanced features, such as statistical analysis, on filters, fields containing values, faceted searches, and more. If data cannot be classified and broken down into separate fields, it would prevent taking full advantage of Elasticsearch and Kibana because every search would be full text. Grok is great for almost every type of log file.
Prerequisite
- Both Elasticsearch and Logstash must be installed and running before Grok can be used.
- The Grok plugin comes installed with Logstash by default, so there’s no need to separately install it.
Using Grok Filters
- Grok filter combines patterns to something that will match the logs. Tell Grok what to search for simply by defining a pattern:
%{SYNTAX:SEMANTIC}
.
- The
SYNTAX
refers to the name of the pattern. For example, theNUMBER
pattern can match2.60, 7, 9
or any number, andIP
pattern can match192.4.732.4
or182.34.77.5
etc.
SEMANTIC
is the identifier given to matched text. This identifier is the key of the “key-value” pair created by Grok and the value is the matching pattern text. Using the example above2.60, 7, 9
could be aduration
of some event and a182.34.77.5
could be the ip address of the client.
- Based on the above, this can be expressed quite simply by using the Grok pattern
%{NUMBER:duration} %{IP:client}
.
- To further breakdown the idea of syntax and semantic, let’s take a useful field from a sample log from an http request:
82.34.77.5 GET /page.html 16236 0.075
- The pattern below can be used for this:
1 | %{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration} |
- In the filter of the configuration, the (“Key-Value”) Syntax-Semantic pairs are defined that match the patterns available in the filter to the specific element(s) of the log message in a sequential manner.
- To view another example, read this log from a file.
[comment]:< (I wanted to show where the user can find the log messages for their reference, it can be removed if not useful.)
Note: Log messages were stored in /var/log/http.log
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 | input { file { path => "/var/log/http.log" } } filter { grok { match => { "message" => "%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration}" } } } output { elasticsearch { hosts => ["localhost:1700"] } } |
- Thanks to the grok filter, the above event will now have these extra fields in it (e.g
client
andmethod
).
- client:
82.34.77.5
- method:
GET
- request:
/page.html
- bytes:
16236
- duration:
0.075
Building Logstash Patterns
Logstash provides over one hundred common patterns by default. The user’s own can be added trivially.
The main purpose for using plugins like Grok to process data from patterns is to breakdown and organize data using different fields as parameters.
Break down the logline into the following fields: class, log level, timestamp, and the remainder of the message. A logstash pattern might be:
%{SYNTAX:SEMANTIC}
where syntax is a value to match, and semantic is the name to associate it with. Also, syntax can be either a datatype like aNUMBER
orIPORHOST
for a hostname.
1 2 3 4 5 6 7 8 9 | grok { match => ["message", "\[%{TIMESTAMP_ISO8601:timestamp}\]\[%{DATA:loglevel}%{SPACE}\]\[%{DATA:source}%{SPACE}\]%{SPACE}\[%{DATA:node}\]%{SPACE}\[%{DATA:index}\] %{NOTSPACE} \[%{DATA:updated-type}\]", "message", "\[%{TIMESTAMP_ISO8601:timestamp}\]\[%{DATA:loglevel}%{SPACE}\]\[%{DATA:source}%{SPACE}\]%{SPACE}\[%{DATA:node}\] (\[%{NOTSPACE:Index}\]\[%{NUMBER:shards}\])?%{GREEDYDATA}"] } |
- Elastic’s website has a git repo of Logstash Grok patterns that can be used as a reference. Elastic has a general repository of patterns as well that include other filters besides Grok.
Additional Information : There are tools out there to aid in building a custom Grok Pattern:
NOTE: Any patterns created with the GREEDYDATA
option will be very expensive operations, because it’s designed to backtrack and retry, and this could potentially result in timeouts or other problems.
Basic Pipeline
- create a configuration file with a pipeline
- run Logstash with the
.conf
file (e.g.[user]$ /usr/share/logstash/bin/logstash -f /usr/share/logstash/config/logstash_example.conf
)
- Logstash should output something like this to the terminal:
1 | The stdin plugin is now waiting for input: |
- At this point, Logstash should treat something entered into the terminal input as “an event and then send it back to the terminal.”
Grok’s role is to take input messages and give them with structure.
If Kibana has been installed, then the easiest way to debug and test out Grok filters and Logstash patterns is to use the “Grok Debugger” in Kibana’s GUI interface. If the Kibana service is running, just navigate here:
1 | http://localhost:5601/app/kibana#/dev_tools/grokdebugger?_g=() |
Conclusion
The Grok plugin makes handling large segments of information much faster with its ability to define specific items to target. When combining this functionality with the Logstash tool it becomes an automatic process. This makes a significant difference in the amount of time and effort required to interpret the data. The possibilities for exception and inclusion of raw data are limitless, and Logstash working in concert with Grok allows a user to decide what those will always be and how they are managed.
Pilot the ObjectRocket Platform Free!
Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.
Get Started