Data Onboarding in Splunk

Caroline Lea
May 25, 2018
02:24 pm

Data Onboarding in Splunk

By: Joe Wohar | Splunk Consultant

Splunk is an amazing platform for analyzing any and all data in your business, however you may not be getting the best performance out of Splunk if you’re using the default settings. To get the best performance out of Splunk when ingesting data, it is important to specify as many settings as possible in a file called “props.conf” (commonly referred to as “props”). Props set ingestion settings per data sourcetype and if you do not put anything into props for your sourcetype, Splunk will automatically try to figure it out for you. While this can be a good thing when you’re first beginning with Splunk, having Splunk figure out how to parse and ingest your data affects the overall performance of Splunk. By configuring the ingestion settings manually, Splunk doesn’t have to figure out how to ingest your data. These are the 8 settings that you should set for every sourcetype in order to get the best performance:

SHOULD_LINEMERGE – As the name suggests, this settings determines whether lines from a data source file are merged or not. If your data source file contains 1 full event per line, set this to “false”; if your data source file contains multiple lines per event, set this to “true”. If you set this to “true” you’ll also need to use some other settings such as BREAK_ONLY_BEFORE or MUST_BREAK_AFTER to determine how to break the data up into events.

LINE_BREAKER – This setting divides up the data coming in based on a regular expression defining the “breaks” in the data. By default, this setting looks for new lines, however if your events are all on the same line, you’ll need create a regular expression to divide the data into lines.

TRUNCATE – TRUNCATE will split an event if it’s number of characters exceeds the value set. The default is 10000, it’s a good idea to lower this to better fit your data and it’s absolutely necessary to increase this if the events exceed 10000 characters.

TIME_PREFIX – This setting takes a regular expression for what precedes the timestamp in events so that Splunk doesn’t have to search through the event for the timestamp.

MAX_TIMESTAMP_LOOKAHEAD – This tells Splunk how far to check after the TIME_PREFIX for the full timestamp so that it doesn’t keep reading further into the event.

TIME_FORMAT – Define a timestamp in strftime format. Without this defined, Splunk has to go through its list of predefined timestamp formats to determine which one should be used.

EVENT_BREAKER – This setting should be set on the Splunk Forwarder installed on the host where the data originally resides. It takes a regular expression which defines the end of events so that only full events will be sent from the Splunk Forwarder to your indexers. EVENT_BREAKER_ENABLE – This setting merely tells Splunk to start using the EVENT_BREAKER setting. It defaults to false so if when you use EVENT_BREAKER, you’ll need to set this to “true”.

There are many other settings which can be used but as long as you have these defined, Splunk will perform much better than if it has to figure them out on its own. For more information on these settings, visit Splunk’s documentation on props: https://docs.splunk.com/Documentation/Splunk/latest/Admin/Propsconf

If you have questions, or would like more information on data onboarding best practices, please contact us: