Re-Index Raw Splunk Events to a New Index

      By: Zubair Rauf  |  Splunk Consultant, Team Lead

 

A few days ago, I came across a very rare use case in which a user had to reindex a specific subset of raw Splunk events into another index in their data. This was historical data and could not be easily routed to a new index at index-time.

After much deliberation on how to move this data over, we settled on the summary index method, using the collect command. This would enable us to search for the specific event we want and reindex them in a separate index.

When re-indexing raw events using the “collect” command, Splunk automatically assigns the source as search_id (for ad-hoc search) and saved search name (for scheduled searches), sourcetype as stash, and host as the Splunk server that runs the search. To change these, you can specify these values as parameters in the collect command. The method we were going to follow was simple – build the search to return the events we cared about, use the collect command with the “sourcetype,” “source,” and “host” values to get the original values of source, sourcetype, and host to show up in the new event.

To our dismay, this was not what happened, as anything added to those fields is treated as a literal string and doesn’t take the dynamic values of the field being referenced. For example, host = host would literally change the host value to “host” instead of the host field in the original event. We also discovered that when summarizing raw events, Splunk will not add orig_source, orig_sourcetype, and orig_index to the summarized/re-indexed events.

To solve this problem, I had to get creative, as there was no easy and direct way to do that. I chose props and transforms to solve my problem. This method is by no means perfect and only works with two types of events:

  • – Single-line plaintext events
  • – Single-line JSON events

Note: This method is custom and only works if all steps are followed properly. The transforms extract field values based on regex, therefore everything has to be set up with care to make sure the regex works as designed. This test was carried out on a single-instance Splunk server, but if you are doing it in a distributed environment, I will list the steps below on where to install props and transforms.

The method we employed was simple:

  1. Make sure the target index is created on the indexers.
  2. Create three new fields in search for source, sourcetype, and host.
  3. Append the new fields to the end of the raw event (for JSON, we had to make sure they were part of the JSON blob).
  4. Create Transforms to extract the values of those fields.
  5. Use props to apply those transforms on the source.

I will outline the process I used for both types of events using the _internal and _introspection index in which you can find both single-line plain text events and single-line JSON events.

Adding the Props and Transforms

The best way to add the props and transforms is to package them up in a separate app and push them out to the following servers:

  1. Search Head running the search to collect/re-index the data in a new index (If using a cluster, use the deployer to push out the configurations).
  2. Indexers that will be ingesting the new data (If using a clustered environment, use the Indexer cluster Manager Node, previously Master Server/Cluster Master to push out the configuration).

Splunk TA

I created the following app and pushed it out to my Splunk environment:

TA-collect-raw-events/
└── local
├── props.conf
└── transforms.conf

The files included the following settings:

Transforms.conf

[setDynamicSource]
FORMAT = source::$1
REGEX = ^.*myDynamicSource=\"([^\"]+)\"
DEST_KEY= MetaData:Source

[setDynamicSourcetype]
FORMAT = sourcetype::$1
REGEX = ^.*myDynamicSourcetype=\"([^\"]+)\"
DEST_KEY= MetaData:Sourcetype

[setDynamicHost]
FORMAT = host::$1
REGEX = ^.*myDynamicHost=\"([^\"]+)\"
DEST_KEY= MetaData:Host

[setDynamicSourceJSON]
FORMAT = source::$1
REGEX = ^.*\"myDynamicSourceJSON\":\"([^\"]+)\"
DEST_KEY= MetaData:Source

[setDynamicSourcetypeJSON]
FORMAT = sourcetype::$1
REGEX = ^.*\"myDynamicSourcetypeJSON\":\"([^\"]+)\"
DEST_KEY= MetaData:Sourcetype

[setDynamicHostJSON]
FORMAT = host::$1
REGEX = ^.*\"myDynamicHostJSON\":\"([^\"]+)\"
DEST_KEY= MetaData:Host

Props.conf

[source::myDynamicSource]
TRANSFORMS-set_source = setDynamicSource, setDynamicSourcetype, setDynamicHost

[source::myDynamicSourceJSON]
TRANSFORMS-set_source = setDynamicSourceJSON, setDynamicSourcetypeJSON, setDynamicHostJSON

Once the TA is successfully deployed on the Indexers and Search Heads, you can use the following searches to test this solution.

Single-line Plaintext Events

The easiest way to test this is by using the _internal, splunkd.log data as it is always generating when your Splunk instance is running. I used the following search to take ten sample events and re-index them using the metadata of the original event.

index=_internal earliest=-10m@m latest=now
| head 10
| eval myDynamicSource= source
| eval myDynamicSourcetype= sourcetype
| eval myDynamicHost= host
| eval _raw = _raw." myDynamicSource=\"".myDynamicSource."\" myDynamicSourcetype=\"".myDynamicSourcetype."\" myDynamicHost=\"".myDynamicHost."\""
| collect testmode=true index=collect_test source="myDynamicSource" sourcetype="myDynamicSourcetype" host="myDynamicHost"

Note: Change testmode=false when you want to index the new data as testmode=true is to test your search to see if it works.

The search does append some metadata fields (created with eval) in the original search to the newly indexed event. This method will use license to index this data again as the sourcetype is not stash.

Single-line JSON Events

To test JSON events, I am using the Splunk introspection logs from the _introspection index. This search also extracts ten desired events and re-indexes them in the new index. This search inserts metadata fields into the JSON event:

index=_introspection sourcetype=splunk_disk_objects earliest=-10m@m latest=now
| eval myDynamicSourceJSON=source
| eval myDynamicSourcetypeJSON=sourcetype
| eval myDynamicHostJSON=host
| rex mode=sed s/.$//g
| eval _raw = _raw.",\"myDynamicSourceJSON\":\"".myDynamicSourceJSON."\",\"myDynamicSourcetypeJSON\":\"
".myDynamicSourcetypeJSON."\":\",\"myDynamicHostJSON\":\"".myDynamicHostJSON."\
"}"
| collect testmode=true index=collect_test source="myDynamicSourceJSON" sourcetype="myDynamicSourcetypeJSON" Host="myDynamicHostJSON"

The data in the original events does not pretty-print as JSON, but all fields are extracted in the search as they are in the raw Splunk event.

The events are re-indexed into the new index, and with the | spath command, all the fields from the JSON are extracted as well as visible under Interesting Fields.

One thing to note here is that this is not a mass re-indexing solution. This is good for a very specific use case where there are not a lot of variables involved.

To learn more about this or if you need help with implementing a custom solution like this, please feel free to reach out to us.

Splunk Upgrade Script

      By: Chris Winarski  |  Splunk Consultant

 

We have all run into occasional difficult situations when upgrading Splunk environments, but have you ever had to upgrade many boxes all at once? The script below may help with that, and if properly tailored to your environmental settings, can ease the pain of Splunk upgrades across vast environments. I have put the script in the plainest terms possible and added comments to increase readability so that even the most inexperienced Splunk consultant can create a successful Splunk upgrade deployment.

The script is separated into three parts, one of which only requires your input and customization for the script to function properly. The variables are the most important part as this will point to what your environment would look like. The script should not need updating (other than customization for your environment), but feel free to omit anything you don’t wish to include. Script execution may not need any changes, but if your devices do not use keys, I have left in a line “#ssh -t “$i” “$REMOTE_UPGRADE_SCRIPT.” Just remove the pound sign and put a pound sign in from the line above it.

 

Splunk Upgrade Script

#!/usr/bin/env bash

### ========================================== ###
###                  VARIABLES                 ###
### ========================================== ###

HOST_FILE="hostlist" #Create a file on the local instance where you run this from called "hostlist" with hosts, *IMPORTANT - ONLY 1 host per line

SPLUNK_USER="splunk:splunk" #Splunk user and group, this can vary from environment to environment, however, i have populated the defaults

BACKUP_LOCATION="/tmp" #Where you would like the backup of your splunk is saved, /tmp is the chosen default

BACKUP_NAME="etc.bkp.tgz" #The backup file (this is an arbitrary name), however, keep the .tgz format for the purpose of this script

DOWNLOADED_FILE="splunk_upgrade_download.tgz" #What your download upgrade is going to be called, you can change this, however, keep it .tgz file format

SPLUNK_HOME="/opt/splunk" #Default home directory, again, change per your environment needs

PRIVATE_KEY_PATH="~/key" #This is the path in which your private key resides in which your target hosts contain your public key

BASE_FOLDER="/opt" #This is the base folder in which splunk resides,This is also the location where you will be downloading and untaring the downloaded upgrade /opt is the default where splunk is best practices to install in this location

SSH_USER="ec2-user" #This is the user on your target machine which has sudo permissions **Very Important**

#1. Go to https://www.splunk.com/en_us/download/previous-releases.html and click on your operating system, and what version of splunk you will be upgrading to
#2. Click "Download Now" for the .tgz.
#3. It will redirect you to another page and in the upper right you'll see a block with "Download via Command Line (wget)". Click that and copy the URL in between the ' ' and starting with https://
URL="'https://www.splunk.com/bin/splunk/DownloadActivityServlet?architecture=x86_64&platform=linux&version=8.2.0&product=splunk&filename=splunk-8.2.0-e053ef3c985f-Linux-x86_64.tgz&wget=true'"

### ========================================== ###
###            REMOTE UPGRADE SCRIPT           ###
### ========================================== ###

REMOTE_UPGRADE_SCRIPT="
#Stopping Splunk as Splunk user..
sudo -u $SPLUNK_USER $SPLUNK_HOME/bin/splunk stop

#Creating Backup of /opt/splunk/etc and placing it into your designated backup location with name you choose above
sudo -u $SPLUNK_USER tar -cvf $BACKUP_LOCATION/$BACKUP_NAME $SPLUNK_HOME/etc

#Executing the download from Splunk of the upgrade version you choose above
sudo -u root wget -O $BASE_FOLDER/$DOWNLOADED_FILE $URL

#Extract the downloaded ungrade over the previously installed splunk
cd $BASE_FOLDER
sudo -u root tar -xvzf $DOWNLOADED_FILE

#Give the changes ownership to the splunk user
sudo -u root chown -R $SPLUNK_USER:$SPLUNK_USER $SPLUNK_HOME

#Launch splunk and complete the upgrade
sudo -u $SPLUNK_USER $SPLUNK_HOME/bin/splunk start --accept-license --answer-yes --no-prompt
echo ""Splunk has been upgraded""

#cleaning up downloaded file
sudo -u root rm -rf $DOWNLOADED_FILE
"

### ========================================== ###
###              SCRIPT EXECUTION              ###
### ========================================== ###

#The remote script above is executed below and will go through your hostlist file and host by host create a backup and upgrade each splunk instance.

echo "In 5 seconds, will run the following script on each remote host:"
echo
echo "===================="
echo "$REMOTE_UPGRADE_SCRIPT"
echo "===================="
echo
sleep 5
echo "Reading host logins from $HOST_FILE"
echo
echo "Starting."
for i in `cat "$HOST_FILE"`; do
if [ -z "$i" ]; then
continue;
fi
echo "---------------------------"
echo "Installing to $i"
ssh -i $PRIVATE_KEY_PATH -t "$SSH_USER@$i" "$REMOTE_UPGRADE_SCRIPT"
#ssh -t "$i" "$REMOTE_UPGRADE_SCRIPT"
done
echo "---------------------------"
echo "Done"

 

If you have any questions or concerns regarding the script or just don’t feel quite as comfortable with Splunk upgrades, feel free to contact us and we’ll be happy to lend a helping hand.

 

How to Combine Multiple Data Sources in Splunk SPL

      By: Yetunde Awojoodu  |  Splunk Consultant

 

Depending on your use case or what you are looking to achieve with your Splunk Processing Language (SPL), you may need to query multiple data sources and merge the results. The most intuitive command to use when these situations arise is the “join” command, but it tends to consume a lot of resources – especially when joining large datasets. I will be describing a few other commands or functions that can be applied when combining data from multiple sources in Splunk, including their benefits and limitations.

“OR” Boolean Operator

The most common use of the “OR” operator is to find multiple values in event data, e.g. “foo OR bar.” This tells the program to find any event that contains either word. However, the “OR” operator is also commonly used to combine data from separate sources, e.g. (sourcetype=foo OR sourcetype=bar OR sourcetype=xyz). Additional filtering can also be added to each data source, e.g., (index=ABC loc=Ohio) OR (index=XYZ loc=California). When used in this manner, Splunk runs a single search, looking for any events that match any of the specified criteria in the searches. The required events are identified earlier in the search before calculations and manipulations are applied.

Syntax for “OR” Operator:

(<search1>) OR (<search2>) OR (<search3>)

Pros:

  • – Merges fields and event data from multiple data sources
  • – Saves time since it does only a single search for events that match specified criteria and returns only the applicable events before any other manipulations

Cons:

  • – Only used with base searches. Does not allow calculations or manipulations per source, so any further calculations or manipulations will need to be performed on all returned events

Example: In the example below, the OR operator is used to combine fields from two different indexes and grouped by the customer_id, which is common to both data sources.

Append Command

Append is a streaming command used to add the results of a secondary search to the results of the primary search. The results from the append command are usually appended to the bottom of the results from the primary search. After the append, you can use the table command to display the results as needed. Note that the secondary search must begin with a generating command. It is important to also note that append searches are not processed like subsearches where the subsearch is processed first. They are run at the point they are encountered in the SPL.

Syntax for Append:

<primary search> ... | append [<secondary search>]

Pros:

  • – Displays fields from multiple data sources

Cons:

  • – Subject to a maximum result rows limit of 50,000 by default
  • – The secondary search must begin with a generating command
  • – It can only run over historical data, not real-time data

Example: In the example below, the count of web activities on the Splunk User Interface is displayed from _internal index along with count per response from the _audit index.

The last four rows are the results of the appended search. Both result sets share the count field. You can see that the append command just tacks on the results of the subsearch to the end of the previous search, even though the results share the same field values.

Multisearch Command

Multisearch is a generating command that runs multiple streaming searches at the same time. It requires at least two searches and should only contain purely streaming operations such as eval, fields, or rex within each search. One major benefit of the multisearch command is that it runs multiple searches simultaneously rather than sequentially as with the append command. This could save you some runtime especially when running more complex searches that include multiple calculations and/or inline extractions per data source. Results from the multisearch command are interleaved, not added to the end of the results as with the append command.

Syntax for the Multisearch Command:

| multisearch [<search1>] [<search2>] [<search3>] ...

Since multisearch is a generating command, it must be the first command in your SPL. It is important to note that the searches specified in square brackets above are not actual subsearches. They are full searches that produce separate sets of data that will be merged to get the expected results. A subsearch is a search within a primary or outer search. When a search contains a subsearch, Splunk processes the subsearch first as a distinct search job and then runs the primary search.

Pros:

  • – Merges data from multiple data sources
  • – Multisearch runs searches simultaneously, thereby saving runtime with complex searches
  • – There is no limit to the number of result rows it can produce
  • – Results from the multisearch command are interleaved allowing for a more organized view

Cons:

  • – Requires that the searches are entirely distributable or streamable
  • – Can be resource-intensive due to multiple searches running concurrently. This needs to be taken into consideration since it can cause search heads to crash.

Example: In the example shown below, the multisearch command is used to combine the action field from the web_logs index and queue field from the tutorial_games index using eval to view the sequence of events and identify any roadblocks in customer purchases. The results are interleaved using the _time field.

Union Command

Union is a generating command that is used to combine results from two or more datasets into one large dataset. The behavior of the union command depends on whether the dataset is a streaming or non-streaming dataset. Centralized streaming or non-streaming datasets are processed the same as append command while distributable streaming datasets are processed the same as multisearch command.

Syntax for Union Command:

| union [<search2>] [<search2>] … OR … | union [<search>]

However, with streaming datasets, instead of this syntax:
<streaming_dataset1> | union <streaming_dataset2>

Your search is more efficient with this syntax:
... | union <streaming_dataset1>, <streaming_dataset2>

Pros:

  • – Merges data from multiple data sources
  • – Can process both streaming and non-streaming commands, though behavior will depend on the command type
  • – As an added benefit of the max out argument, which specifies the maximum number of results to return from the subsearch. The default is 50,000 results. This value is the maxresultrows setting in the [searchresults] stanza in the limits.conf file.

Example: The example below is similar to the multisearch example provided above and the results are the same. Both searches are distributable streaming, so they are “unioned” by using the same processing as the multisearch command.

In the second example below, because the head command is a centralized streaming command rather than a distributable streaming command, any subsearches that follow the head command are processed using the append command. In other words, when a command forces the processing to the search head, all subsequent commands must also be processed on the search head.

Comparing the Four Options

The table below shows a comparison:

# OR Append Multisearch Union
1. Boolean Operator Streaming command Generating command Generating Command
2. Used in between searches Used in between searches Must be the first command in your SPL Can be either the first command or used in between searches. Choose the most efficient method based on the command types needed
3. Results are interleaved Results are added to the bottom of the table Results are interleaved Results are interleaved based on the time field
4. No limit to the number of rows that can be produced Subject to a maximum of 50,000 result rows by default No limit to the number of rows that can be produced Default of 50,000 result rows with non-streaming searches. Can be changed using maxout argument.
5. Requires at least two base searches Requires a primary search and a secondary one Requires at least two searches Requires at least two searches that will be “unioned”
6 Does not allow use of operators within the base searches Allows both streaming and non-streaming operators Allows only streaming operators Allows both streaming and non-streaming operators
7. Does only a single search for events that match specified criteria Appends results of the “subsearch” to the results of the primary search Runs searches simultaneously Behaves like multisearch with streaming searches and like append with non-streaming
8. Transforming commands such as chart, timechart or stats cannot be used within the searches but can be specified after Transforming commands such as chart, timechart or stats cannot be used within the streaming searches Behaves like multisearch with streaming searches and like append with non-streaming

I hope you now have a better understanding of the different multisearch command options presented and will make the most optimized choice for your use case.

Want to learn more about combining data sources in Splunk? Contact us today!

Spruce Up Your Dashboard Panels with Dynamic Y-Axis Labels/Units

      By: Marvin Martinez  |  Splunk Consultant

 

Dashboards in Splunk, and the panels within those dashboards, are extremely versatile and powerful. Sometimes, however, wide variances in the underlying data being analyzed and displayed in those panels can make it hard to effectively relay that information when the data has varying degrees of magnitudes or when you’d like the axes to contain dynamic labels depending on the data being shown. Today, we’ll look at a relatively small, but mighty, method that can be implemented in your dashboards to help bolster the way your dashboard panels display your key data.

In this example, we’ll examine a case where a panel is displaying metrics for daily ingestion for a given index. The data is coming from a summary index and shows ingested megabytes (MB) over time. However, depending on the index, the ingestion may be better displayed in terms of GB, or even TB. In this case, the y-axis of the chart would not look its best if the values looked like “2,000,000 MB” as that is not a practical and easy-to-read display of the data.

The first step is to determine what units are most prudent to display for the index in question. To do this, a global base search is leveraged to determine the expected units. In the example search below, the base search retrieves all the MB values for the specified index, takes an average of the values, and compares it against thresholds for TB (1024*1024 MB), GB (1024 MB), and MB (default case). Depending on the value of the average, the search returns a specific “Metric” (or “Units”) result that is inserted into a dashboard token that will be used later.

Once the desired units are determined, there are only two things that need to be done to ensure any visualization going forward can leverage it accurately and successfully:

  1. Ensure your chart SPL is doing the same conversion (“normalization”) to ensure data integrity.
  2. Assign the derived unit’s value as the y-axis label.

In the visualization SPL, the eval statement below was included to “normalize” the data and ensure that the actual values match the units that are noted by the y-axis label. Do this for any other values that may need to be normalized as well. If necessary, the “Metric” field can be removed in your SPL later, or you could just use the token in the CASE statement itself. The “Metric” field was just created for ease of use in the subsequent eval expression.

|eval Metric = $metricTok|s$, Actual = CASE(Metric = “MB”, Actual, Metric= “GB”, round(Actual/1024,2), Metric = “TB”, round(Actual/1024/1024,2))

Finally, in the options for your chart, add an entry for “charting.axisTitleY.text” and assign the dashboard token set in your base search as the value, as shown below.

The end result, as shown in the images below, is a dynamic y-axis label that adjusts to the units that best suit the underlying data and make it easy to read instead of having to wade through various zeroes in the intervals! The images below are the same chart displaying data from three separate indexes with varying ingestion totals, each displaying in the units best suited for the selected index.

Using tokens allows for great versatility and power in your dashboards. Now, even the axes can be easily configured to dynamically adjust to your wide-ranging values.

Want to learn more about customizing your Splunk dashboards? Contact us today!

How to Enable Splunk Boot-Start Using Systemd

      By: Jon Walthour  |  Senior Splunk Consultant, Team Lead

 

Splunk switched to a default of enabling boot-start to systemd back in 7.2.2. It did this because using systemd has become the default system initialization and service manager for most major Linux distros. They switched back to using SysV init in version 7.3 to 8.1.0 because of shortcomings in how Splunk was utilizing systemd for service startup and shutdown. Since the startup and shutdown actions prompted for root credentials, this broke many automated processes out in the wild. It wasn’t until version 8.1.1 that an option was added to the “enable boot-start” command to install “Polkit” rules to grant non-root users like “Splunk” to have a certain level of centralized system control to allow for the starting and stopping of the Splunk systemd service. Starting with version 8.1.1, the preferred method for setting up boot-start for Splunk Enterprise is via systemd.

What are the advantages of using systemd, you might ask? Plenty. First, systemd offers parallel processing to allow more to be done concurrently during system boot-up. Additionally, it allows for a standard framework for expressing dependencies between processes. This means, in the case of the Splunk systemd initialization, Splunk’s startup can be dependent on network services starting successfully. The configuration of systemd is standardized with unit text files and does require the creation of custom scripts.

Systemd also offers enhancements specifically to Splunk in that it provides a way to monitor and manage the splunkd service independent of Splunk itself. It provides tools for debugging and troubleshooting boot-time and service-related issues with Splunk — again, independent of the Splunk software itself. Most importantly, systemd allows for the use of Linux control groups (cgroups), which forms the backbone of the workload management features in Splunk Enterprise.

Below are the steps to enable Splunk to start at system boot under systemd as well as other recommended operating system configurations for Splunk:

1. Install Polkit (if not already installed).

sudo su -
yum -y update
yum -y install polkit

sudo /opt/splunk/bin/splunk enable boot-start -systemd-managed 1 -systemd-unit-file-name splunk -create-polkit-rules 1 -user splunk -group splunk

NOTE: If you get message “CAUTION: The system has systemd version < 237 and polkit version > 105. With this combination, polkit rule created for this user will enable this user to manage all systemd services. Are you sure you want to continue [y/n]?”, select “y,” then create the following two files and run the following chmod command:

vi /etc/polkit-1/rules.d/10-Splunkd.rules

polkit.addRule(function(action, subject) {
     if (action.id {== "org.freedesktop.systemd1.manage-units" &&
     subject.user ==} "splunk") {
     try {
         polkit.spawn(["/usr/local/bin/polkit_splunk", ""+subject.pid]);
         return polkit.Result.YES;
     } catch (error) {
         return polkit.Result.AUTH_ADMIN;
     }
     }
});

vi /usr/local/bin/polkit_splunk

#!/bin/bash -x
COMM=($(ps --no-headers -o cmd -p $1))

if [[ "${COMM[1]}" {== "start" ]] ||
     [[ "${COMM[1]}" ==} "stop" ]] ||
     [[ "${COMM[1]}" == "restart" ]]; then

         if [[ "${COMM[2]}" {== "Splunkd" ]] ||
            [[ "${COMM[2]}" ==} "Splunkd.service" ]]; then
                 exit 0
         fi
fi

exit 1

chmod 755 /usr/local/bin/polkit_splunk

2. Edit the splunk.service file and make the following adjustments:

vi /etc/systemd/system/splunk.service

File created in /etc/systemd/system/splunk.service:
#This unit file replaces the traditional start-up script for systemd
#configurations, and is used when enabling boot-start for Splunk on
#systemd-based Linux distributions.

[Unit]
Description=Systemd service file for Splunk, generated by 'splunk enable boot-start'
After=network.target

[Service]
Type=simple
Restart=always
ExecStart=/opt/splunk/bin/splunk _internal_launch_under_systemd
KillMode=mixed
KillSignal=SIGINT
TimeoutStopSec=600

LimitCORE=0


LimitDATA=infinity


LimitNICE=0


LimitFSIZE=infinity


LimitSIGPENDING=385952


LimitMEMLOCK=65536


LimitRSS=infinity


LimitMSGQUEUE=819200


LimitRTPRIO=0


LimitSTACK=infinity


LimitCPU=infinity


LimitAS=infinity


LimitLOCKS=infinity


LimitNOFILE=1024000


LimitNPROC=512000


TasksMax=infinity

SuccessExitStatus=51 52
RestartPreventExitStatus=51
RestartForceExitStatus=52
User=splunk

Group=splunk

Delegate=true
CPUShares=1024
MemoryLimit=<value>
PermissionsStartOnly=true
ExecStartPost=/bin/bash -c "chown -R splunk:splunk

/sys/fs/cgroup/cpu/system.slice/%n"
ExecStartPost=/bin/bash -c "chown -R splunk:splunk

/sys/fs/cgroup/memory/system.slice/%n"

[Install]
WantedBy=multi-user.target

Change or check the following settings in the splunk.service file:

  • – Change TimeoutStopSec to 600
  • – Add all Limit____ lines and TasksMax
  • – Check user and group
  • – Set MemoryLimit to the total system memory available in bytes
  • – Check both “ExecStartPost” chown is right user:group

3. Add a systemd service for disabling THP:

[Unit]
Description=Disable Transparent Huge Pages (THP)

[Service]
Type=simple
ExecStart=/bin/sh -c "echo 'never' > /sys/kernel/mm/transparent_hugepage/enabled && echo 'never' > /sys/kernel/mm/transparent_hugepage/defrag"

[Install]
WantedBy=multi-user.target

4. Finally, enable the whole thing and reboot:

sudo systemctl daemon-reload

sudo systemctl start disable-thp
sudo systemctl enable disable-thp

sudo systemctl start splunk
sudo systemctl enable splunk

shutdown -r now

5. Lastly, check your work to ensure (a) Splunk was started under systemd and (b) transparent huge pages is disabled and ulimits are set according to the values defined in the systemd init file.

$ ps -ef|grep splunkd
splunk 3848 1 3 12:10 ? 00:00:04 splunkd --under-systemd --systemd-delegate=yes -p 8089 _internal_launch_under_systemd
splunk 4731 3848 0 12:10 ? 00:00:00 [splunkd pid=3848] splunkd --under-systemd --systemd-delegate=yes -p 8089 _internal_launch_under_systemd [process-runner]
splunk 4931 4731 0 12:10 ? 00:00:00 /opt/splunk/bin/splunkd instrument-resource-usage -p 8089 --with-kvstore
splunk 5504 5442 0 12:12 pts/0 00:00:00 grep --color=auto splunkd

$ splunk status
splunkd is running (PID: 3848).
splunk helpers are running (PIDs: 4731 4749 4853 4931).

cat /opt/splunk/var/log/splunk/splunkd.log | grep ulimit

06-01-2021 12:05:17.513 +0000 INFO ulimit - Limit: virtual address space size: unlimited
06-01-2021 12:05:17.513 +0000 INFO ulimit - Limit: data segment size: unlimited
06-01-2021 12:05:17.513 +0000 INFO ulimit - Limit: resident memory size: unlimited
06-01-2021 12:05:17.513 +0000 INFO ulimit - Limit: stack size: unlimited
06-01-2021 12:05:17.513 +0000 INFO ulimit - Limit: core file size: 0 bytes
06-01-2021 12:05:17.513 +0000 WARN ulimit - Core file generation disabled.
06-01-2021 12:05:17.513 +0000 INFO ulimit - Limit: data file size: unlimited
06-01-2021 12:05:17.513 +0000 INFO ulimit - Limit: open files: 1024000 files
06-01-2021 12:05:17.513 +0000 INFO ulimit - Limit: user processes: 512000 processes
06-01-2021 12:05:17.513 +0000 INFO ulimit - Limit: cpu time: unlimited
06-01-2021 12:05:17.513 +0000 INFO ulimit - Linux transparent hugepage support, enabled="never" defrag="never"
06-01-2021 12:05:17.513 +0000 INFO ulimit - Linux vm.overcommit setting, value="0"
06-01-2021 12:10:55.997 +0000 INFO ulimit - Limit: virtual address space size: unlimited
06-01-2021 12:10:55.997 +0000 INFO ulimit - Limit: data segment size: unlimited
06-01-2021 12:10:55.997 +0000 INFO ulimit - Limit: resident memory size: unlimited
06-01-2021 12:10:55.997 +0000 INFO ulimit - Limit: stack size: unlimited
06-01-2021 12:10:55.997 +0000 INFO ulimit - Limit: core file size: 0 bytes
06-01-2021 12:10:55.997 +0000 WARN ulimit - Core file generation disabled.
06-01-2021 12:10:55.997 +0000 INFO ulimit - Limit: data file size: unlimited
06-01-2021 12:10:55.997 +0000 INFO ulimit - Limit: open files: 1024000 files
06-01-2021 12:10:55.997 +0000 INFO ulimit - Limit: user processes: 512000 processes
06-01-2021 12:10:55.997 +0000 INFO ulimit - Limit: cpu time: unlimited
06-01-2021 12:10:55.997 +0000 INFO ulimit - Linux transparent hugepage support, enabled="never" defrag="never"
06-01-2021 12:10:55.997 +0000 INFO ulimit - Linux vm.overcommit setting, value="0"

Want to learn more about Splunk boot-start and systemd? Contact us today!

The 3 Most Common Splunk Issues and How to Solve Them

      By: Aaron Dobrzeniecki  |  Splunk Consultant

 

After working with hundreds of Splunk customers over the years, I have found that there are a few main issues that customers deal with on a daily basis. The three main issues I will be discussing today are troubleshooting issues with data quality (source types not parsing properly), issues with search performance, and finally, issues with high CPU and Memory usage.

Issue 1: Data Quality

One way to check if your data is being parsed properly is to search on it in Splunk. Provide Splunk with the index and sourcetype that your data source applies to. If you see that your data does not look like it was broken up into separate correct events, we have a problem.

Another way to check the quality of your data source is to run the below search in your environment. This is a modified version of a search from the Splunk Monitoring Console -> Indexing -> Inputs -> Data Quality. I have modified this search so that you can run this on any of your Splunk Search Heads.

index=_internal splunk_server=* source=*splunkd.log* splunk_server=* (log_level=ERROR OR log_level=WARN) (component=AggregatorMiningProcessor OR component=DateParserVerbose OR component=LineBreakingProcessor)

| rex field=event_message "Context: source(::|=)(?<context_source>[^\\|]*?)\\|host(::|=)(?<context_host>[^\\|]*?)\\|(?<context_sourcetype>[^\\|]*?)\\|"

| eval data_source=if((isnull(data_source) AND isnotnull(context_source)),context_source,data_source), data_host=if((isnull(data_host) AND isnotnull(context_host)),context_host,data_host), data_sourcetype=if((isnull(data_sourcetype) AND isnotnull(context_sourcetype)),context_sourcetype,data_sourcetype)

| stats count(eval(component=="LineBreakingProcessor" OR component=="DateParserVerbose" OR component=="AggregatorMiningProcessor")) as total_issues dc(data_host) AS "Host Count" dc(data_source) AS "Source Count" count(eval(component=="LineBreakingProcessor")) AS "Line Breaking Issues" count(eval(component=="DateParserVerbose")) AS "Timestamp Parsing Issues" count(eval(component=="AggregatorMiningProcessor")) AS "Aggregation Issues" by data_sourcetype

| sort - total_issues

| rename data_sourcetype as Sourcetype, total_issues as "Total Issues"

See the results of the search below:

You will want to run the above search for at least the last 15 minutes. As you can see, it provides you with the number of Line Breaking issues, Timestamp Issues, and Aggregation Issues for each of your problematic sourcetypes. You can also drill down into the data by clicking on one of the numbers in the columns. In the screenshot below, you will see a warning for issues parsing a timestamp:

Taking the proper steps to make sure your data is correctly parsed correctly will make it a lot easier when working to get value out of your data. The most important step is testing the ingestion of your data in a test instance or environment before implementing in prod.

To correctly parse your data, you need the following 8 settings in props:

On your Splunk Enterprise Boxes:

  • – LINE_BREAKER
  • – SHOULD_LINEMERGE
  • – MAX_TIMESTAMP_LOOKAHEAD
  • – TRUNCATE
  • – TIME_FORMAT
  • – TIME_PREFIX

On the UF:

  • – EVENT_BREAKER_ENABLE
  • – EVENT_BREAKER

Learn more about “the Great 8” in the TekStream blog.

Issue 2: Search Performance

Next, let’s discuss search performance. When creating and crafting searches in Splunk, you want to make sure you give Splunk as much detail as possible about the data you are trying to search. There are four important fields that Splunk looks at to determine which data to bring back for you: index, sourcetype, source, and host. When running searches in Splunk, make sure to include the index or indexes where your data lives, provide the sourcetype or sourcetypes, your source or sources, which is the file path for your data, and finally the host or hosts that are sending the data into Splunk.

When creating a search where you want to get statistical information to show, before you do your calculating commands, make sure you filter the data to exactly what you are looking for. When filtering, avoid or limit using the NOT command. Try to tell Splunk what you are looking to include instead of trying to tell it what to exclude. Now, there is always the scenario where the number of what you want to include is very large. In cases like that, you can create a macro to include everything that we want to have Splunk search on, use a lookup to include all the values, or limit the number of NOTs that are used. Filtering your data to exactly what you are looking for is crucial with Splunk, as it allows Splunk to return more precise results in a quicker fashion. If there is a certain field or fields that you want to show in your search, but when you search the data a large number of fields get returned, use the fields command to tell Splunk which fields you want to bring back, so it does not bring back every single field. If you are able to limit the number of fields that Splunk brings back, it will improve the search performance.

Search performance also includes scheduled searches, reports, and alerts. You want to make sure you do not have too many scheduled searches with the same cron schedule or scheduled time. If you have more searches than cores on your Search Head running at one time, that means that all of your SH cores are being used up. This could cause scheduled searches to be delayed or skipped and could also delay some ad hoc searches from running.

Issue 3: High Resource Usage

High resource usage is the final topic I wanted to discuss. There are a large number of factors that play into how well our Splunk servers are performing. If you are noticing performance issues with your Splunk servers, make sure your boxes are up to Splunk’s best practice recommendations. How well your searches are performing can also determine how well your boxes are performing. If you are running bad searches, you will notice that the CPU and memory usage of your Splunk indexers and Search Heads will increase.

One way to check the resource usage of your Splunk boxes is to use the Monitoring Console -> Resource Usage -> Resource Usage: Instance. On this dashboard, you are able to see how much CPU and memory each of your Splunk boxes are using. I have provided a screenshot of part of this dashboard below:

Another factor that plays into high CPU and memory usage could be the use of Python scripts on your Splunk boxes. There are many Splunk Technical Add-ons that have Python scripts embedded in the app. If you are experiencing high CPU and memory usage, you may want to check on your server to see what processes are using up the most CPU and memory.

Recently I have been experiencing very high CPU and memory usage due to Splunk Accelerated Data Models. If you are using Accelerated Data Models, make sure that the ones you want are the only ones that are accelerated. If you have DMAs accelerated that are not being used, it will still go through the process of running to try and complete the data model and will still use up CPU and memory. A huge tip I have for those running Enterprise Security or simply using the DMAs: disable and unaccelerate the DMAs you are not using. To permanently disable DMAs, follow the instructions below:

Settings -> Data Inputs -> Data Model Acceleration Enforcement

Want to learn more about solving common Splunk issues? Contact us today!

Using the ITSI_Summary Index and Pseudo Entities to Show Overall Health Scores

      By: Brent McKinney  |  Splunk Consultant

In Splunk ITSI, you may come across times where you want to create an overall health service and need health scores from services that belong to private ITSI teams. However, once a service in ITSI belongs to a private team, it can no longer act as a dependency to a service outside its team. In these situations, you may need to manually pull health scores from the ITSI summary index as KPIs. In this blog, we will walk through a use case where we do exactly that, as well as use ITSI service names as pseudo entities to feed these KPIs. The goal here is for ITSI administrators to manually create these dependencies using the itsi_summary index, while still restricting private teams’ access to only their services.

Use Case: Say we have 20 various applications in our environment, and we have an independent ITSI service analyzer built for each. Each application service analyzer belongs to its own private ITSI team, has a number of child services and KPIs attached, and lives in one of three AWS regions: East, West, or Central. We want to create an overall health service analyzer that solely represents the overall health of each region and the applications within it. The challenge here is that because the application services in ITSI are in private teams, we can’t simply add them as dependencies to the overall health service. We will walk through how we can use the itsi_summary index, along with pseudo entities, to achieve the same results.

In an end state, we want to build the following Service Analyzer, with each region containing its applications as pseudo entities:

The “Overall Application Health” service will have no KPIs and simply depend on the region child services for its health score. So to begin, we can create the above four services in ITSI, and add the region services’ overall health scores as dependencies to the “Overall Application Health” service. Each region service will have one KPI, “Overall Health,” and the entities will be made up of the applications within that region.

Now, we will begin defining this “Overall Health” KPI using a search. For this, we will need to know the health score of each application and to which region it belongs. Our goal here is as follows: Create a KPI search that pulls the overall health score of each Application, and filter each application based on region. We can achieve this by utilizing the itsi_summary index and prebuilt macros.

Note: The itsi_summary index uses ONLY service IDs to get information on a particular service, so we’ll need to use the service_kpi_list macro to get the actual service names. This macro contains a lookup that has both the service name and ID, and will allow us to correlate and more intuitively define and filter our applications in the itsi_summary index.

index=itsi_summary
| join itsi_service_id
[| `service_kpi_list`
| search service_name = *east AND service_name=*Overall Health
| rename serviceid as itsi_service_id
]
| rename alert_value as health_score
| table health_score service_name

The above SPL will be used for our KPI searches and contains a sub-search. The inner search, which is run first, uses a macro to pull back a list of each service in the environment, both name and ID, along with the KPIs within each. From here, we filter to bring back the appropriate overall health score of our application and region it belongs to. (Note: This filter may differ depending on the naming convention of your services. The above is an example to filter health scores for the east region service.) We will have a unique KPI search for each region, so our goal here is to filter by region, and only grab the overall health KPI for each application. Once we have the applications we want, we can join this with the itsi_summary index in the outer search to obtain the actual health scores of those applications we returned in the inner search. In short, the inner search gives us the names of the applications we want, and the outer search correlates them to their health scores.

Now that we’re pulling back the health scores that we need, we can define what the threshold field should be, how to split the threshold fields by application, and how to aggregate the threshold field overall. In our example, the threshold field should be “health_score.” Because the application names may not be direct data sources in your environment, but rather the names of services, we can simply define them as we did in the KPI search above, rather than import them in ITSI. Entities defined this way are referred to as “pseudo entitles,” since they exist in the KPI search but not definitively in ITSI. For this, we will choose “service_name” for “Entity Split by field.” This allows us to split and monitor the applications contributing to each region without having to manage the entities themselves in ITSI.

For calculation, we want the latest of “health_score” as entity value and the Average of entity value as aggregate. This says that for each application, take the latest recorded health score, and for the overall health of the region, take the average of all health scores.

We will repeat these steps for each region, editing our SPL to filter accordingly. Once these services are saved and enabled, we can see how each region is performing and contributing to the overall health of our environment, and how each application is contributing to the health of each region! This approach allows us to have applications built out in private teams in ITSI, while still allowing us to have a service that shows us ALL of the applications’ overall health scores.

Want to learn more about the ITSI Summary and how it relates to Overall Health Scores? Contact us today!

Wrangling Your Splunk Indexer Storage

      By: Jon Walthour  |  Senior Splunk Consultant, Team Lead

I have heard many customers, vexed by overflowing storage attached to their indexers, wonder why warm buckets are not rolling to a separate cold storage filesystem when they’re supposed to. Instead, their hot/warm volume on their indexers fills up and indexers are entering automatic detention because the filesystem has less than minFreeSpace MB available. They’ve reviewed their configuration in indexes.conf. They’ve set up volumes with maxVolumeDataSizeMB set to a value 90-95% the size of the filesystem on which they are storing hot and warm buckets on the indexers. And yet, the problem continues. So, they come to me asking why Splunk is broken.

I tell them Splunk isn’t broken; it just doesn’t work the way they expect it to. To get accurate management of hot-warm storage, define every path by a volume, then use maxVolumeDataSizeMB to have the Volume Manager keep the size under control. Two keys:

  1. Define every “Path” using a volume. This includes:
    • – homePath
    • – coldPath
    • – summaryHomePath (for report accelerations)
    • – tsidxStatsHomePath (for data model accelerations)
  2. Create three filesystems for classic storage and two filesystems for SmartStore indexing tiers to cover all the bases:
    • – /opt/splunk – for Splunk software, logs, temp files, etc.
    • – hot/warm volume
    • – cold volume (not needed for SmartStore)

The key that usually makes managing hot/warm storage mysterious is that most folks don’t realize that data model accelerations default to “volume:_splunk_summaries/$_index_name/datamodel_summary.” The “_splunk_summaries” volume defaults to “path = $SPLUNK_DB.” The way things get set up usually, SPLUNK_DB gets set to the same base directory as homePath. So, the data model summaries end up as a subdirectory of homePath/. However, the way the volume manager calculates storage is not “df -h,” but more like adding up all the “du -sk” calculations on all the directories where it is defined, usually homePath and coldPath. As such, datamodel summaries do not get added in and Splunk admins find that their hot/warm volumes get filled up and they can’t figure out why. Instead, define everything in the hot-warm with the hot-warm volume to get accurate management. For those settings that cannot use volume settings (e.g., thaweddb and tsidxStatsHomePath), use SPLUNK_DB and set it to a separate filesystem.

Recommendations

  • ⁃ bloomHomePath has no default. The indexer stores bloomfilter files for the index inline, inside index bucket directories, using a small amount of storage. So it’s not worth separately defining this path as, by default, gets counted in homePath and coldPath.
  • ⁃ $SPLUNK_DB can be left to its default of $SPLUNK_HOME/var/lib/splunk or can be set to a separate filesystem. It does not need to be very large. It is used for tsidxStatsHomePath, which is an indexer-wide setting and, by default, set to $SPLUNK_DB/tsidxstats, and for thawedPath, which in most Splunk deployments is rarely used.

indexes.conf

[default]
homePath=volume:hot-warm/$_index_name/db
coldPath=volume:hot-warm/$_index_name/colddb
thawedPath=$SPLUNK_DB/$_index_name/thaweddb
summaryHomePath=volume:hot-warm/$_index_name/summary
tstatsHomePath=volume:hot-warm/$_index_name/datamodel_summary

[volume:hot-warm]
path = /splunkdata/hot_warm
maxVolumeDataSizeMB = #########

[volume:cold-thawed]
path = /splunkdata/cold_thawed
maxVolumeDataSizeMB = #########

[main]
homePath = volume:hot-warm/defaultdb/db
coldPath = volume:hot-warm/defaultdb/colddb
thawedPath = $SPLUNK_DB/defaultdb/thaweddb
summaryHomePath = volume:hot-warm/defaultdb/summary
tstatsHomePath = volume:hot-warm/defaultdb/datamodel_summary
maxMemMB = 20
maxConcurrentOptimizes = 6
maxHotIdleSecs = 86400
maxHotBuckets = 10
maxDataSize = auto_high_volume

[history]
homePath = volume:hot-warm/historydb/db
coldPath = volume:hot-warm/historydb/colddb
thawedPath = $SPLUNK_DB/historydb/thaweddb
summaryHomePath = volume:hot-warm/historydb/summary
tstatsHomePath = volume:hot-warm/historydb/datamodel_summary
maxDataSize = 10
frozenTimePeriodInSecs = 604800

[summary]
homePath = volume:hot-warm/summarydb/db
coldPath = volume:hot-warm/summarydb/colddb
thawedPath = $SPLUNK_DB/summarydb/thaweddb
summaryHomePath = volume:hot-warm/summarydb/summary
tstatsHomePath = volume:hot-warm/summarydb/datamodel_summary

[_internal]
homePath = volume:hot-warm/_internaldb/db
coldPath = volume:hot-warm/_internaldb/colddb
thawedPath = $SPLUNK_DB/_internaldb/thaweddb
summaryHomePath = volume:hot-warm/_internaldb/summary
tstatsHomePath = volume:hot-warm/_internaldb/datamodel_summary
maxDataSize = 1000
maxHotSpanSecs = 432000
frozenTimePeriodInSecs = 2592000

[_audit]
homePath = volume:hot-warm/audit/db
coldPath = volume:hot-warm/audit/colddb
thawedPath = $SPLUNK_DB/audit/thaweddb
summaryHomePath = volume:hot-warm/audit/summary
tstatsHomePath = volume:hot-warm/audit/datamodel_summary

[_thefishbucket]
homePath = volume:hot-warm/fishbucket/db
coldPath = volume:hot-warm/fishbucket/colddb
thawedPath = $SPLUNK_DB/fishbucket/thaweddb
summaryHomePath = volume:hot-warm/fishbucket/summary
tstatsHomePath = volume:hot-warm/fishbucket/datamodel_summary
maxDataSize = 500
frozenTimePeriodInSecs = 2419200

# this index has been removed in the 4.1 series, but this stanza must be
# preserved to avoid displaying errors for users that have tweaked the index's
# size/etc parameters in local/indexes.conf.
#
[splunklogger]
homePath = volume:hot-warm/splunklogger/db
coldPath = volume:hot-warm/splunklogger/colddb
thawedPath = $SPLUNK_DB/splunklogger/thaweddb
disabled = true

[_introspection]
homePath = volume:hot-warm/_introspection/db
coldPath = volume:hot-warm/_introspection/colddb
thawedPath = $SPLUNK_DB/_introspection/thaweddb
summaryHomePath = volume:hot-warm/_introspection/summary
tstatsHomePath = volume:hot-warm/_introspection/datamodel_summary
maxDataSize = 1024
frozenTimePeriodInSecs = 1209600

[_telemetry]
homePath = volume:hot-warm/_telemetry/db
coldPath = volume:hot-warm/_telemetry/colddb
thawedPath = $SPLUNK_DB/_telemetry/thaweddb
summaryHomePath = volume:hot-warm/_telemetry/summary
tstatsHomePath = volume:hot-warm/_telemetry/datamodel_summary
maxDataSize = 256
frozenTimePeriodInSecs = 63072000

[_metrics]
homePath = volume:hot-warm/_metrics/db
coldPath = volume:hot-warm/_metrics/colddb
thawedPath = $SPLUNK_DB/_metrics/thaweddb
summaryHomePath = volume:hot-warm/_metrics/summary
tstatsHomePath = volume:hot-warm/_metrics/datamodel_summary
datatype = metric
#14 day retention
frozenTimePeriodInSecs = 1209600
metric.splitByIndexKeys = metric_name

# Internal Use Only: rollup data from the _metrics index.
[_metrics_rollup]
homePath = volume:hot-warm/_metrics_rollup/db
coldPath = volume:hot-warm/_metrics_rollup/colddb
thawedPath = $SPLUNK_DB/_metrics_rollup/thaweddb
summaryHomePath = volume:hot-warm/_metrics_rollup/summary
tstatsHomePath = volume:hot-warm/_metrics_rollup/datamodel_summary
datatype = metric
# 2 year retention
frozenTimePeriodInSecs = 63072000
metric.splitByIndexKeys = metric_name

Want to learn more about managing your Splunk Indexer storage? Contact us today!

How to Forward Data to Splunk Cloud: Architecture Options and Step-by-Step Instructions

      By: Forrest Lybarger & Khristian Pena  |  Splunk Consultants

Implementing Splunk Cloud prompts teams to make many decisions about their environment, from hardware specs to compliance standards. One of the big questions a team must answer is, “How will data be sent from devices like workstations and domain controllers to Splunk Cloud?” But that is more complicated than it may seem. Besides the niche forwarding methods (i.e. Splunk Stream App, which usually is mandatory for whatever data will use it), there are 3 options for forwarding data: directly via universal forwarder (UF), indirectly via intermediate forwarder (IF), or directly via a heavy forwarder (HF). Each of these methods has pros and cons that will be covered here, so anyone moving to Splunk Cloud can make a decision on how they will forward data. These strategies also aren’t mutually exclusive; they can be mixed and matched depending on individual circumstances.

Option 1: Send Data to the Splunk Cloud via a Universal Forwarder

A flowchart with three tan boxes connected to a brick firewall icon. The firewall icon is connected to a cloud icon.

The first and simplest option is to send data directly from source hosts to Splunk Cloud via a UF. This approach doesn’t require any additional hardware (unless a deployment server is used) and has no single point of failure. UFs are installed on every source host and are configured with the environment-specific Splunk Cloud Forwarding app (downloadable from every Splunk Cloud Web UI). Problems emerge when considering the connection between the source hosts and Splunk Cloud. Firewall rules will need to be mended in order to allow outbound traffic from source hosts. With a small environment, this ask is easy to implement and maintain, but once the environment scales up, there can be thousands of firewall rules to maintain. The cut off for when this option is viable will depend on the customer, but in general this set up is best for small environments.

Option 2: Send data to the Splunk Cloud via an Intermediate Forwarder

A flowchart depicting how an intermediate forwarder can be used to send data to the Splunk Cloud.

The second option is to send data to an intermediate forwarder before sending to Splunk Cloud. The intermediate forwarder will need to be on its own server in order to have the resources for processing large amounts of data. It is highly recommended to have at least two IFs to prevent a single point of failure (Splunk can load balance between IFs without a dedicated load balancer). IFs use the same software as UFs, so they are very lightweight and perform minimal processing. The main benefit of this architecture is to minimize firewall holes. With this approach, only the IFs will need special firewall rules maintained for them. You can also send heavy forwarder data through these IFs, though that will increase network load due to the increased size of parsed data. Another consideration with this plan is data transmission between data centers. If such a thing is not allowed, there will need to be IFs in every datacenter where there are UFs, and planning must happen to send data to the correct IF. Overall, this option is more scalable than the first, but requires more hardware and coordination.

Option 3: Send Data to the Splunk Cloud via a Heavy Forwarder

A flowchart depicting how a heavy forwarder can be used to send data to the Splunk Cloud.

Finally, data can be sent to a heavy forwarder (HF) before going to Splunk Cloud. Similar to the previous option, the HF acts as an intermediary that requires its own hardware. The big change from the previous architecture is that the HF is a full Splunk instance and parses data sent through it. Parsed data will be significantly larger than unparsed data causing more network load, but will reduce load on indexers in Splunk Cloud and is necessary for some data (individual data source documentation will note if HFs are required). Two or more HFs are recommended to prevent a single point of failure. Like the previous option, this is more scalable than the first option, but requires more hardware and coordination.

In conclusion, the three architectures have their own purposes and can be used in tandem to fulfill each customer’s specific needs. For small environments where firewall management isn’t a problem, sending data directly from source hosts to Splunk Cloud is a viable option with limited extra expenditure. For larger environments, a mix of the other two methods is best. When there’s data that needs an HF, use the third approach. Use the second approach for other data to avoid excessive firewall rules. Whether or not to forward the HF data through an IF is up to the individuals. Doing so will use more network bandwidth, but sending to Splunk Cloud from the HF will require more firewall rules.

Forwarder Configuration to Splunk Cloud

Regardless of which architecture your organization decides to go with for sending local data to Splunk Cloud, you will need to install the Splunk Universal Forwarder software from your Cloud Deployment. The package is called Splunk Universal Forwarder Credentials, but you are able to install this app on your UFs as well as your HFs.

The Universal Forwarder Credentials file contains a custom certificate for your Splunk Cloud Deployment.

Download the Splunk Universal Forwarder Credentials:

  1. In your Splunk Cloud instance, got to Apps > Universal Forwarder.
  2. Click Download Universal Forwarder Credentials.
  3. Note the location of the downloaded file; it will be named splunkclouduf.spl.
  4. Copy the file to your /tmp folder on the instance that will be receiving the credentials or to your Deployment server.

A screenshot of the Splunk Cloud app illustrating where the Universal Forwarder Credentials can be accessed and downloaded.

Install the forwarder credentials on individual forwarders.

(We recommend managing all forwarders from your Deployment Server vs. manually updating each instance.)

This will only apply if you do not use a Deployment Server to manage your forwarders.

  1. Install the following app on your forwarder by running this command: /opt/splunkforwarder/bin/splunk install app /tmp/splunclouduf.spl
  2. When you are prompted for a login, use the user name and password for the Universal Forwarder instance. The following message will display when you have successfully installed the credentials package: App ‘/tmp/splunkcloud.spl’ installed.
  3. Restart the forwarder: /opt/splunkforwarder/bin/splunk restart

Want to learn more about forwarding data to the Splunk Cloud? Contact us today!