Splunk’s Alerting Capability and a Common Problem Causing SVC Spikes 

By David Allen, Sr. Certified Splunk Consultant

Splunk is one of the largest and most innovative leaders in the Observability platform industry, with log monitoring and alerting capabilities that are second to none. However, with all this power at the user’s fingertips there are some potential issues that users need to be aware of as they start their Splunk journey. 

Initially, using Splunk is straightforward. When the environment is small and usage is low, it’s easy to configure alerts and scheduled searches to execute at the top of every hour without issue. Everyone does it as it is very easy to setup this way. 

However, as more apps are installed, ingestion numbers rise, the number of alerts and scheduled searches increase, a significant problem emerges: all of these searches start running simultaneously causing a SVC (Splunk Virtual Compute) spike possibly causing a performance impact. 

In particular, when a high volume of Splunk operations (like 1,000 simultaneous Alerts and Scheduled searches) begin, they immediately consume SVC resources. If the cumulative SVC demand is greater than the system’s entitlement, Splunk is forced to implement prioritization. This means some searches are run while others are delayed. The resulting performance bottlenecks and excessive runtimes can cause a cascading failure, as delayed queries may overlap with their subsequent scheduled runs, continuously compounding the resource strain. 

In the Splunk workload pricing model, customers purchase a fixed capacity of SVC. Having hundreds of SVCs sitting idle 90% of the time to account for this issue is not an economical solution. 

The key question is: How can Splunk handle this situation, and how can this peak usage issue be minimized or avoided entirely? 

 

The Easiest Solution: Staggering Start Times 

The simplest way to reduce the SVC spike caused by many queries running concurrently is to change the query start time by staggering them a few minutes away from the top of the hour. 

For example, while it is simple to set an alert to run every hour at the top of the hour or maybe some offset from the top of the hour (which is suitable for many applications), this article will discuss what to do if you need the alert to run more frequently than once an hour, such as every 15 minutes

Below shows the typical settings to have an alert run at the top of the hour. Notice that there is no way using this default setting to allow the alert to run more than once an hour no matter what offset is selected. 

To make an alert run every 15 minutes you need setup the alert to run on a cron schedule. Do this by selecting the “Run on Cron Schedule” selection from the dropdown as shown below. The cron syntax alows a lot of scheduling options available that are not available with any of the other drop down menu options. 

 Once the “Run on Cron Schedule” is selected, you need to enter a Cron Expression. For example */15 * * * *.  

You may learn the cron scheduling options by looking online at various cron websites. My favorite cron website is crontab.guru. With this website you can play with various cron settings to select any imaginable scenario to run your query.  
 

By changing the cron setting to 5-59/15 you can see that the query will run every 15 minutes starting at 5 minutes after the top of the hour and every subsequent 15 minutes. Notice the run times in the red square. 

By using this offset technique you can distribute all the alert queries start times and remove the high SVC spike at the top of the hour. 

Here is how this would look on the Alert setup page. 

Lastly, when using a cron schedule, remember to set the Time Range of the search. Generally this will match the query frequency. Unless you have a non-standard use case, there is no reason to set a longer Time Range than the cron expression frequency. 

 

 
So wrapping it up, here is a sample Alert Offset Distribution in minutes for a large enterprise. 

After you are done making the changes to “flatten the curve” the Alert Offset distribution bar chart may look something like this… 

Here is the query to generate the above barchart. 

 Here is the soft copy of the search.. 
 

| rest /servicesNS/-/-/saved/searches timeout=0 

| fields title eai:acl.app cron_schedule label eai:acl.sharing search disabled 

| rename eai:acl.* AS * 

|eval cron_count = len(cron_schedule) 

|search cron_count != 0 AND disabled = 0 

|regex cron_schedule="^[0-9]*-[0-9\/]*\s" 

|rex field=cron_schedule "^(?<offset_minutes>[0-9]*)-[0-9]*\/(?<refresh>[0-9]*)\s" 

|appendpipe [| rest /servicesNS/-/-/saved/searches timeout=0 

| fields title eai:acl.app cron_schedule label eai:acl.sharing search disabled 

| rename eai:acl.* AS * 

|eval cron_count = len(cron_schedule) 

|search cron_count != 0 AND disabled = 0 

|regex cron_schedule="^\*\/\d*\s" 

|rex field=cron_schedule "^\*\/(?<refresh>[0-9]*)\s" 

|eval offset_minutes=0] 

|stats count BY offset_minutes 

|sort 0 offset_minutes

Remember to be nice to your environment and keep your environment from having a SVC spike at the top of the hour. By using the technique described in this article you can make your Splunk environment run more efficiently and economically. 

Learn how to prevent Splunk peak usage bottlenecks caused by simultaneous alert execution here.

About the Author

David Allen has over 35 years of experience in the information technology industry, including hardware design, software development, and entrepreneurship. He has extensive experience in various programming languages, development tools and Splunk. David exhibited his entrepreneurship skills when he founded his own AV company and ran it successfully for over 15 years using Splunk as its main data analytics software. As a Sr. Splunk Consultant he works to assist others with their Splunk issues and is constantly learning new technology and especially everything Splunk. David holds both a Bachelor of Science in Electrical Engineering and a Bachelor of Science in Computer Science Engineering from LeTourneau University in Longview, Texas as well as two United States Patents. David currently resides in Richardson Texas with his family.