Guide On How To Use The Grok Filter Plugin Logstash Pattern

Have a Database Problem? Speak with an Expert for Free
Get Started >>

Introduction

Plugins and filters are extremely useful for clearly defining data guidelines. Monitoring and retrieval can be finely tuned to only pay attention to certain datasets. Grok is used for matching specific parameters to a given unstructured bulk of data to extract only what is wanted. It enables Logstash to establish its criteria for monitoring and analysis with a simple setup that is laid out in the tutorial that follows.

Grok is used to match lines with regular expressions then mapping parts of the text into sections and act based on the mapping.

The Grok Filter Plugin is useful to parse event logs and divide messages to multiple fields. Instead of creating regular expressions, users will utilize predefined patterns for parsing logs.

Grok offers a way to parse unstructured log data into a format can be queried.

Grok combines text patterns to form something to match system logs. A pattern in grok has this format %{SYNTAX:SEMANTIC}.

This allows the use of advanced features, such as statistical analysis, on filters, fields containing values, faceted searches, and more. If data cannot be classified and broken down into separate fields, it would prevent taking full advantage of Elasticsearch and Kibana because every search would be full text. Grok is great for almost every type of log file.

Prerequisite

  • Both Elasticsearch and Logstash must be installed and running before Grok can be used.
  • The Grok plugin comes installed with Logstash by default, so there’s no need to separately install it.

Using Grok Filters

  • Grok filter combines patterns to something that will match the logs. Tell Grok what to search for simply by defining a pattern: %{SYNTAX:SEMANTIC}.
  • The SYNTAX refers to the name of the pattern. For example, the NUMBER pattern can match 2.60, 7, 9 or any number, and IP pattern can match 192.4.732.4 or 182.34.77.5 etc.
  • SEMANTIC is the identifier given to matched text. This identifier is the key of the “key-value” pair created by Grok and the value is the matching pattern text. Using the example above 2.60, 7, 9 could be a duration of some event and a 182.34.77.5 could be the ip address of the client.
  • Based on the above, this can be expressed quite simply by using the Grok pattern %{NUMBER:duration} %{IP:client}.
  • To further breakdown the idea of syntax and semantic, let’s take a useful field from a sample log from an http request: 82.34.77.5 GET /page.html 16236 0.075
  • The pattern below can be used for this:
1
%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration}
  • In the filter of the configuration, the (“Key-Value”) Syntax-Semantic pairs are defined that match the patterns available in the filter to the specific element(s) of the log message in a sequential manner.
  • To view another example, read this log from a file.

[comment]:< (I wanted to show where the user can find the log messages for their reference, it can be removed if not useful.)

Note: Log messages were stored in /var/log/http.log

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
input {



file {



path => "/var/log/http.log"



}



}



filter {



grok {



match => { "message" => "%{IP:client}



%{WORD:method} %{URIPATHPARAM:request}



%{NUMBER:bytes} %{NUMBER:duration}"
}



}



}



output {



elasticsearch { hosts => ["localhost:1700"] }



}
  • Thanks to the grok filter, the above event will now have these extra fields in it (e.g client and method).
  • client: 82.34.77.5
  • method: GET
  • request: /page.html
  • bytes: 16236
  • duration: 0.075

Building Logstash Patterns

  • Logstash provides over one hundred common patterns by default. The user’s own can be added trivially.

  • The main purpose for using plugins like Grok to process data from patterns is to breakdown and organize data using different fields as parameters.

  • Break down the logline into the following fields: class, log level, timestamp, and the remainder of the message. A logstash pattern might be: %{SYNTAX:SEMANTIC} where syntax is a value to match, and semantic is the name to associate it with. Also, syntax can be either a datatype like a NUMBER or IPORHOSTfor a hostname.

1
2
3
4
5
6
7
8
9
grok {



match => ["message", "\[%{TIMESTAMP_ISO8601:timestamp}\]\[%{DATA:loglevel}%{SPACE}\]\[%{DATA:source}%{SPACE}\]%{SPACE}\[%{DATA:node}\]%{SPACE}\[%{DATA:index}\] %{NOTSPACE} \[%{DATA:updated-type}\]", "message", "\[%{TIMESTAMP_ISO8601:timestamp}\]\[%{DATA:loglevel}%{SPACE}\]\[%{DATA:source}%{SPACE}\]%{SPACE}\[%{DATA:node}\] (\[%{NOTSPACE:Index}\]\[%{NUMBER:shards}\])?%{GREEDYDATA}"]



}

Additional Information : There are tools out there to aid in building a custom Grok Pattern:

NOTE: Any patterns created with the GREEDYDATA option will be very expensive operations, because it’s designed to backtrack and retry, and this could potentially result in timeouts or other problems.

Basic Pipeline

  • create a configuration file with a pipeline
  • run Logstash with the .conf file (e.g. [user]$ /usr/share/logstash/bin/logstash -f /usr/share/logstash/config/logstash_example.conf)
  • Logstash should output something like this to the terminal:
1
The stdin plugin is now waiting for input:
  • At this point, Logstash should treat something entered into the terminal input as “an event and then send it back to the terminal.”
  • Grok’s role is to take input messages and give them with structure.

  • If Kibana has been installed, then the easiest way to debug and test out Grok filters and Logstash patterns is to use the “Grok Debugger” in Kibana’s GUI interface. If the Kibana service is running, just navigate here:

1
http://localhost:5601/app/kibana#/dev_tools/grokdebugger?_g=()

Conclusion

The Grok plugin makes handling large segments of information much faster with its ability to define specific items to target. When combining this functionality with the Logstash tool it becomes an automatic process. This makes a significant difference in the amount of time and effort required to interpret the data. The possibilities for exception and inclusion of raw data are limitless, and Logstash working in concert with Grok allows a user to decide what those will always be and how they are managed.

Pilot the ObjectRocket Platform Free!

Try Fully-Managed CockroachDB, Elasticsearch, MongoDB, PostgreSQL (Beta) or Redis.

Get Started

Keep in the know!

Subscribe to our emails and we’ll let you know what’s going on at ObjectRocket. We hate spam and make it easy to unsubscribe.