Optimizing Splunk Searches

Caroline Lea
December 6, 2019
10:00 am

It may be interesting to learn that not all searches need to be optimized. Your search should be optimized if it will be administered often or queries large amounts of data. With Splunk Processing Language (SPL), you can obtain the same set of results using a different array of commands, however, when constructing your searches, it is important to consider the impact to memory and network resources in your environment.

How to Determine Whether Your Search is Unoptimized

It runs for a long period of time
It retrieves larger amounts of data than needed from the indexes

(View the amount of data retrieved from the indexer in the Job Inspector)

You tend to hit the disk search quota while running the search
It results in a slow or sluggish system

Creating Optimized Searches

To optimize the speed at which your search runs, it is important that you minimize the amount of processing time required by each component of the search. Your search can be slow because of the complexity of your query to retrieve events from the index. Below are some useful guidelines:

Choose an Appropriate Time Frame

Set the time picker to search within the exact time window your results should be found. This will limit the number of buckets that will need to be searched in the specified index. For instance, if you specify a search for the past 24 hours, only buckets in the specified index with data for the last 24 hours will be searched. “All time” searches are discouraged and “Real-time” searches should almost never be used due to resource consumption.

Use an Efficient Search Mode

Splunk has three search modes which are Fast, Smart and Verbose. Change your search mode depending on what you need to see. Select verbose mode sparingly, using only when needed. Since it returns all of the fields and event data it possibly can, it takes the longest time to run. See more on Splunk search modes here.

Retrieve only what is needed

How you construct your search has a significant impact on the number of events retrieved from disk. Be restrictive and specific when retrieving events from the index. If you need only a portion of the whole data, limit your search early to extract the portion you need before any data manipulation or calculations. You can specify an index, source type, host, source, specific word or phrases in the events. Include as many search terms as possible in your base search. Also, use the head command to limit events retrieved when you need just a subset of the data and remove unnecessary fields from the search results by using commands such as fields and where.

Example:

index=audit sourcetype=access_combined host=admin (action=failed OR action=cancelled) | stats count by user

Use Efficient SPL Commands

a) Choice of Commands

As mentioned above, you can arrive at the same results using a different combination of commands but your choice will determine the efficiency of your search. Below are a few tips:

- - Joins and Lookups – Avoid using multiple joins and lookups in your search. They are very resource-intensive. Perform joins and lookups only on the required data and consider using append over join.
  - Eval – Perform evaluations on the minimum number of events possible
  - Stats – Where possible, use stats command over the table command
  - Table – Use table command only at the end of your search since it cannot run until all results are returned to the search head.
  - Dedup – The stats command is more efficient than dedup. Consider listing the fields you want to dedup in the “by” clause of the stats command. For instance,

…| stats count by user action source dest

…| stats latest by user action source dest

Rather than

…| dedup user action source dest

b) Order of Commands

The order in which commands are specified in a search is extremely important since this determines where the commands are executed which could be at the search head or on the indexers. When part or all of a search is run on the indexers, the search processes in parallel and search performance is much faster. It is good to parallelize as much work as possible. The aim is to not overburden the search head by making the indexer(s) do some of the work.

If your commands can be arranged so that they execute on the indexer, the overall search will execute quicker. Move commands that bring data to the search head as late as possible in your search criteria

Streaming and Non-Streaming Commands

Understanding streaming and non-streaming commands is important in discussing the order of commands in a search. Non-streaming commands include transforming commands such as stats, timechart, top, rare, dedup, sort and append which operate on the entire result set of event data and are always executed on the search head regardless of order. To optimize your searches, place non-streaming commands as late as possible in your search string.

There are two types of streaming commands – Distributable and Centralized Streaming Commands. Distributable Streaming Commands such as eval, fields, rename, replace and regex operate on each event returned by a search regardless of the event order. They can be executed on the indexer. However, if any of the preceding search commands is executed on the search head, the distributable command will be executed on the search head as well. When possible, allow distributable streaming commands to precede non-streaming commands.

Similar to the distributable streaming commands, centralized streaming commands operate on each event returned by a search but event order is important and commands execute only on the search head.

To inspect your search, take a look at the Job Inspector and Search Job Properties. There are two search pieces: remoteSearch and reportSearch that show where parts of your search string are executed. RemoteSearch is the part of the search string executed on the remote nodes (indexers) while the reportSearch is the part executed on the search head.

Let’s look at a simple example to illustrate this:

index=security user=admin failed

| timechart count span=1h

| stats avg(count) as average

In this example, the base search retrieves events from the index and the search head executes the timechart and stats commands which are both transforming commands and outputs the results.

A second example:

index=network sourcetype=cisco_wsa_squid usage=” violation”

| stats count AS connections by username usage

| rename username as violator

| search connections >=10

In the above example, stats command is a transforming command so it will be executed on the search head. “Rename” is a distributable streaming command which could execute on the indexer but because it occurs after a transforming command, it will also execute on the search head.

For better performance, reorder the commands as follows so that “rename” precedes “stats” and is therefore executed on the indexer.

index=network sourcetype=cisco_wsa_squid usage=violation

| rename username as violator

| stats count AS connections by username usage

| search connections >=10

Check the Job Inspector

Finally, check the job inspector tool to examine the overall stats of your search including where Splunk spent its time. Use the tool to troubleshoot search performance and understand the impact of knowledge objects (lookups, tags) on processing.

Reference Links

https://docs.splunk.com/Documentation/Splunk/latest/Search/Writebettersearches

https://docs.splunk.com/Documentation/Splunk/latest/Search/Aboutoptimization

https://docs.splunk.com/Documentation/Splunk/latest/Search/Quicktipsforoptimization

https://docs.splunk.com/Documentation/Splunk/latest/Search/Changethesearchmode

Want to learn more about optimizing your Splunk searches? Contact us today!