Find Hosts Not Ingesting into Splunk
By Aaron Dobrzeniecki, Senior Splunk Consultant
As my day was about to end, I was tasked with a rather extremely time-consuming mission. After researching for over 30 minutes to find a resolution to reach my end goal, one of my colleagues reached out to me with an ultimate resolution. Now, you may be wondering to yourself, what was my end goal? I had a list of 99 hosts that Splunk, when I ran the search below using Splunk SPL, provided only those 53 hosts that WERE ingesting into my index.
“What is Splunk SPL?”, you may be asking. Splunk SPL (Search Processing Language) is the powerful query language used in Splunk for searching, analyzing, and visualizing machine-generated data. SPL commands are written in the Splunk search bar to interact with data indexed in Splunk. The TekStream team knows Splunk SPL well – we even spoke about this topic at .conf 2024!
So back to my end goal: how can I easily and efficiently perform a function that will provide me with only the hosts NOT currently ingesting into my index?
index=<MYINDEX> host IN (LIST OF HOSTS) | stats count by host | fields – count
Using the search above, I could take the results (which do not include the count number, this is easier for copy pasting of the hosts), download a software like Beyond Compare 4 and compare the complete list of hosts to my search results hosts. Not only would this take more work and time, but I would have to use an external software to achieve my goal. The easiest way to achieve our end goal is to use Splunk SPL.
The search below is an adaptation of my search but includes some ingenious logic to achieve our goal. ***WHEN COPYING AND PASTING THE SEARCH BELOW, PLEASE MAKE SURE YOU FIX THE DOUBLE QUOTES, THEY WILL PASTE INCORRECTLY***
index=<MYINDEX> host IN (YOUR HOSTS)
| eval host=upper(host)
| dedup host
| fields host
| eval DATASET="INDEXEDHOSTS"
| append
[| makeresults
| eval host="LIST OF YOUR HOSTS"
| eval host=upper(host)
| rex mode=sed field=host "s/[\n\r]/ /g"
| makemv host
| mvexpand host
| eval DATASET="ALLHOSTS"]
| stats values(*) AS * BY host
| eval Count=mvcount(DATASET)
| search Count =1 AND DATASET= "ALLHOSTS"
| eval host=lower(host)
| table host
| sort host
I will break the search into two parts. The first part searches the internal index searching for our hosts. I have added the upper command so that all the hosts are capitalized (you can use upper or lower for this part). The goal of the first part of the search is to list all the hosts that are ingesting into our index and label them as INDEXEDHOSTS.
We then append the second part of our search to the above. The goal of the second half of the search is to create a dataset using all our hosts and call it ALLHOSTS. The rex command removes any newline or carriage return in the list so that our hosts are lined up in list form. The makemv and mvexpand consolidate the hosts into a single list, and then it expands the hosts into single hosts.
After our two searches are appended together, we perform a stats values of every field appearing in the data. Preceding this, we run an eval of count doing an mvcount on DATASET. The mvcount function is used to count the number of values within a single field instance. The mvcount function helps you understand how many values are present in a multivalue field for each event or result in your search.
Please see the screenshots below that show my search and the results. For testing purposes, I am using the internal index within Splunk. The last three hosts in my list starting with p204 do not exist and are not ingesting into the internal index. As you can see from the first screenshot, all the other hosts are ingesting into the internal index.
As shown above, you can see the search I ran to accomplish my goal of providing me with the hosts that are not ingesting into the internal index. The three hosts shown in the results are the hosts that are not ingesting into the internal index, hence providing me with my resolution very quickly.
In conclusion, being able to find the hosts quickly and efficiently in a list that are not ingesting into Splunk or a certain index allows you to get to the root cause of why they are not ingesting in a more rapid manner. Identifying the root cause, whether it be configuration errors, connectivity issues or other technical challenges, is the second step of our process towards resolution. Implementing corrective measures promptly, such as creating a scheduled report of the search above, will ensure uninterrupted flow of data into the designated indexes. If you are having configuration problems with your hosts, consider reviewing this TekStream Blog as well.