Splunk for Monitoring: Pinger APP!

By Bryan Bollou, Splunk Consultant

One of the many uses for Splunk is monitoring an environment and its assets. This can be done in different ways and this article highlights two of them.

The first is through monitoring of the logs the machines are supposed to be sending to Splunk. If a log successfully gets to Splunk this shows the log generation, Splunk forwarding and Splunk service are working/running. If a host stops sending logs, it can be an indication that the machine is down. In some environments, however, logs may stop due to system restarts, configuration changes, low traffic times, etc. Here the host is up but is not sending logs (at the moment). This means the simplest way of checking if a machine is up can still be useful – ping. This will be the second method discussed in this blog post to monitor assets reporting a Splunk environment.

Monitoring Hosts with Logging

Monitoring hosts with logging involves first understanding what logs are expected and establishing a time threshold for when they are “stopped”. This threshold can differ as some logs are expected to be constantly noisy such as firewalls, while others could be based on user activity and/or time of day such as transaction logs.

The following SPL searches can be used to get visibility on the current indexes and whether they have received logs after a certain period of time:

Find and List all Indexes

| eventcount summarize=false index=* index=_* 
| dedup index 
| fields index

Look at indexes in the last 24 hours, if they have not sent logs in the last 5 minutes, list them out.

| tstats latest(_time) as latest where index=* earliest=-24h by index
| eval recent = if(latest > relative_time(now(),"-5m"),1,0), realLatest = strftime(latest,"%c")
| where recent=0
Splunk search interface showing a tstats query that evaluates data activity by index over the past 24 hours. The query flags inactive indexes by checking if their latest event is older than 5 minutes. The table includes index names, timestamps, and activity status.

The results of this search could be used to make an alert for when an index stops receiving logs. This, however, could be too general if you have multiple hosts sending to a particular index. Thus, splitting the search by host could yield more specific and useful results.

Look at hosts in the last 24 hours, if they have not sent logs in the last 5 minutes, list them out.

| tstats latest(_time) as latest where index=* earliest=-24h by host
| eval recent = if(latest > relative_time(now(),"-5m"),1,0), realLatest = strftime(latest,"%c")
| where recent=0
Splunk dashboard displaying a tstats search segmented by host, highlighting systems that stopped sending data in the last 24 hours. Hosts are listed with timestamps and an inactivity flag, indicating potential connectivity or service issues.

This SPL can be saved as an alert with the parameters below to get an alert that tells a customer whenever one of their hosts stops sending logs to Splunk for X amount of time (5 minutes in this example).

Splunk alert configuration panel for setting up a scheduled alert titled “Host Stopped Sending Alert.” The alert runs every 10 minutes, triggers if result count is greater than zero, and suppresses repeated notifications for the same host within 60 minutes.

An email alert action would look something like this:

Email alert from Splunk titled “Host Stopped Sending Alert” showing that a host (10.8.0.6) stopped sending data. Includes event timestamp details and a direct link to view results in Splunk. Attached is a CSV export of the alert result.

To expand on the above alert, we can use Splunk to find out if machine is up or down based on ping status. For this, I will go over an app I created to ping machines in a lookup file and send the results in an event format to Splunk.

A prerequisite for this app and the script below to work is for the machine (where the script resides) to be able to ping all other machines it targets. For larger and segregated environments, the script can be deployed to multiple machines that have different “ping access” in order to have more coverage.

Below is the commented bash script that looks at the lookup tables in the pinger app directory and pings them one by one. Note: depending on if it is a full Splunk install or Splunk Universal forwarder the LOOKUP_LOCATION variable will be different. Line 6 needs to be uncommented for the former (full) and Line 5 for the latter (universal forwarder). Machines that are pinged and respond will print an “up” status log and those that don’t will print a “down” status log.

Screenshot of a shell script (ping.sh) that loops through CSV lookup files to ping host IPs and echo their status (up or down) with associated Splunk fields like host, index, and sourcetype. The script reads CSV headers and performs a 5-ping test on each IP address.

The script currently looks for a csv with the headers, “ip”, “host”, “index” and “sourcetype” – in that order. This can be changed by modifying line 20. An example of lookup file this script would work with is here:

Spreadsheet showing a sample lookup file used by the Pinger app, listing columns for IP address, host, index, and sourcetype. Each row defines a system to be monitored via the script

Along with the script, an inputs configuration is needed to point to the script above and run it on at an interval. In this example the interval is 240 seconds or 4 minutes. The results of the script will then be forwarded to the index main and have the sourcetype pinger.

Configuration snippet from inputs.conf that schedules the ping.sh script to run every 240 seconds in Splunk. Specifies the script path, interval, index name (main), and sourcetype (pinger).

To ensure the logs that are sent to Splunk come in a good format, a props configuration needs to be created. In the below configuration, the sourcetype pinger disables line merging and has the default newline line breaker. The logs do not have a time stamp and will take the time the script is ran on the machine.

Configuration file (props.conf) defining how events from the Pinger app are parsed in Splunk. Includes settings for line breaking, truncation limit, timestamp lookahead, and placeholders for time prefix and format.

This props configuration needs to go on the first full instance of Splunk the pinger app’s logs will touch. For example, let’s say the pinger app is on a universal forwarder that forwards its logs to a heavy forwarder before being sent to an indexer in a Splunk Cloud instance. In this case, the props.conf would be needed on the heavy forwarder. If the universal forwarder sent logs straight to Splunk Cloud, the props.conf would need to be on the Splunk Cloud Indexers. If the pinger app is installed directly on a heavy forwarder as opposed to a universal forwarder, the props.conf app needs to be on the heavy forwarder instance.

Below is an example of the structure of the pinger app. The script goes into the app_name/bin directory. Lookup tables with the hostnames and ip addresses to ping will go under the app_name/lookups directory. The configuration files will go under the app_name/default directory.

Terminal output showing the directory structure of a custom Splunk app named pingerapp, including subfolders like bin, default, lookups, and metadata, and configuration files like inputs.conf, props.conf, and a ping.sh script.

The app will reside in the $SPLUNK_HOME/etc/apps, for example:

Terminal view displaying the path /Applications/Splunk/etc/apps and a list of installed Splunk apps including pingerapp, search, sample_app, and several third-party or custom add-ons.

To check if the app will work for the machine it is installed on, you can navigate to the $SPLUNK_HOME/etc/apps/pingerapp/bin directory and run the script manually.

Terminal output from executing the ping.sh script, showing simulated ping results with status (up or down), IP addresses, and hostnames such as local, google, nowhere, and opendns.

Once the app is properly installed and the Splunk instance is restarted, the script will start generating logs and forwarding them to the appropriate index. The logs would appear like below:

Splunk event viewer showing log data ingested from the ping.sh script, with fields like status, IP, host, source path to the script, index, and sourcetype indicating the data came from a custom pinger app.

These logs can be then used to discover the machines with the latest status of “down” (their last ping test did not return results as per the script). This represents a machine that is not currently pingable in an environment.

Splunk SPL search that filters for status=down events and uses stats, rename, and fields commands to list IP addresses of hosts currently reported as down, based on 25,808 matching events.

To get the best of both “monitoring” worlds, a combined search can be used. A prerequisite for this to work is that the hostnames and/or ip addresses for the machine sending the logs and how it’s pinged, need to match. A joined search like the one below can then be used to get a list of the hosts that stopped sending logs then feed it into a search looking for hosts that are not pingable.

Advanced SPL search in Splunk Cloud using a join with tstats to detect hosts that are down and also not sending logs, highlighting one host (10.100.0.4) meeting the condition of “no logs” in the past 24 hours.

This can be made into an alert like this:

Splunk “Save As Alert” configuration panel titled “Host Not Pingable and Stopped Sending Logs,” with settings to run every 10 minutes, trigger when results are greater than zero, and send email notifications.
Splunk alert configuration for email notifications, showing message details including dynamic variables like $name$ in the subject and message body, with options selected to include links and attach CSV results.

Tools like this alert and the customizable pinger app can be used to enhance the monitoring capabilities of Splunk. For questions on how to make this script more customizable for your environment or how to use Splunk to monitor your assets, ask one of our consultants by filling in the form below. Happy Splunking!