Optimizing Splunk Dashboards with Post-Process Searches

Optimizing Splunk Dashboards with Post-process Searches

When creating Splunk dashboards, we often have the same search run multiple times showing different types of graphs or with slight variations (i.e. one graph showing “allowed” and another showing “blocked”). This creates more overhead every time the dashboard is opened or refreshed, causing the dashboard to open or populate more slowly and increasing the demand on the Splunk infrastructure. There are other situations or limitations that occur such as user concurrent-search limits.

With proper optimization techniques a full typical dashboard with 10 panels can run less than three Splunk queries versus the 10 individual searches that would normally run. This is accomplished by using Post-process searches that are easily added in the SimpleXML of the desired dashboard.

Starting Point of Post-process Searches

When running a search in Splunk it will return RAW event data or transformed event data. Transformed event data is data that was returned by a search and is placed in the form of statistical tables which is used as the basis for visualizations. The primary transforming commands are:

  • Chart
  • Timechart
  • Top
  • Rare
  • Stats

The Post-process search is known and referred to as a base search. The base search should always avoid returning RAW events and instead return transformed results. This is largely due to one of the limitation of Post-process being it can only return a max of 500,000 events and it will truncate without warning. To circumvent this limitation, it is best practice to use one of the transforming commands and as always, refine your search as much as possible to reduce the number of results and reduce your search .

The Documented Limitations of Post-Process Searches

The documentation that is provided on Splunk Docs show a few limitations that you should consider before using the Post-process search:

http://docs.splunk.com/Documentation/Splunk/6.2.5/Viz/Savedsearches#Post-process_searches

  • Chaining for multiple post-process searches is not currently supported for SimpleXML dashboards.
  • If the base search is a non-transforming search, the Splunk platform retains only the first 500,000 events returned. The post-process search does not process events in excess of this 500,000 event limit, silently ignoring them. This results in incomplete data for the post-process search. A transforming search as the base search helps avoid reaching the 500,000 event limitation.
  • If the post-processing operation takes too long, it can exceed Splunk Web client’s non-configurable timeout value of 30 seconds. This can result in a timeout due to an unresponsive splunkd daemon/service. This scenario typically happens when you use a non-transforming search as the base .

Examples of the Basic Concepts

Splunk Search with non-transforming commands returning RAW results:

Splunk Search with transforming command retuning transformed results:

Examples of Post-process

There are many different ways to determine what should be the base search and what should be in each post-process search. One method is to create all of the queries for your dashboard first and then find the beginning commonality between the searches which will end up being your base search. Then the part that does not meet the commonality will then be the post-process searches. Keep in mind that if you have four Splunk queries and three have a commonality but the fourth is completely different, you can do the base search for the three common Splunk queries and the fourth will just run as a normal query.

We will take the following 5 Splunk queries as our example for what we have determined to put into our new dashboard. If you just ran these in our dashboard it would run 5 almost identical queries taking up valuable search resources and user limits.

sourcetype=”pan:threat” action=allowed | stats count by app
sourcetype=”pan:threat” action=allowed | stats count by rule
sourcetype=”pan:threat” action=allowed | stats count by category
sourcetype=”pan:threat” action=allowed | stats count by signature
sourcetype=”pan:threat” action=allowed | stats count, values(rule) as rule by dest_ip

As we can easily see, the commonality of the 5 queries is going to be:

sourcetype=”pan:threat” action=allowed |

The issue with just taking that portion as your base search is that it will return RAW results. If we review the 5 queries, they are using 5 different fields which means our transforming base search should include all 5 fields.

sourcetype=”pan:threat” action=allowed
| stats count by app, category, rule, signature, dest_ip, src_ip

If we continue our method of initially creating our dashboard with our 5 independent queries:

Then we can switch to the XML source view of the dashboard and start making our base search and post-process searches. Below is how the dashboard’s XML looks before using any post-process searches.

<

<panel>

<table>

<title>Applications</title>

<search>

<query>sourcetype=”pan:threat” action=allowed | stats count by app</query>

<earliest>-24h@h</earliest>

<latest>now</latest>

</search>

<option name=”drilldown”>none</option>

</table>

</panel>

<panel>

<table>

<title>Rule</title>

<search>

<query>sourcetype=”pan:threat” action=allowed | stats count by rule</query>

<earliest>-24h@h</earliest>

<latest>now</latest>

</search>

<option name=”drilldown”>none</option>

</table>

</panel>

<panel>

<table>

<title>Catergory</title>

<search>

<query>sourcetype=”pan:threat” action=allowed | stats count by category</query>

<earliest>-24h@h</earliest>

<latest>now</latest>

</search>

<option name=”drilldown”>none</option>

</table>

</panel>

</row>

<row>

<panel>

<table>

<title>Signature</title>

<search>

<query>sourcetype=”pan:threat” action=allowed | stats count by signature</query>

<earliest>-24h@h</earliest>

<latest>now</latest>

</search>

<option name=”drilldown”>none</option>

</table>

</panel>

<panel>

<table>

<title>Rules by Destination IP</title>

<search>

<query>sourcetype=”pan:threat” action=allowed | stats count, values(rule) as rule by dest_ip</query>

<earliest>-24h@h</earliest>

<latest>now</latest>

</search>

<option name=”drilldown”>none</option>

</table>

</panel>

</row>

</dashboard>

We will create our base search with the following:

Base Search sourcetype=”pan:threat” action=allowed
| stats count by app, category, rule, signature, dest_ip, src_ip
Post-process 1 | stats sum(count) as count by app
Post-process 2 | stats sum(count) as count by rule
Post-process 3 | stats sum(count) as count by category
Post-process 4 | stats sum(count) as count by signature
Post-process 5 | stats sum(count) as count, values(rule) as rule by dest_ip

Once in the XML Source view, create your base search at the top, under the label but before the first row:

The base search id can be named anything (in this case it is “baseSearch”) but it is best to make it something easy because you will need to use it throughout the dashboard. The base search id is referenced in each post-process search which appends the base search in front of each post-process search. To create the base search, the id is placed inside of the search tags at the top of the dashboard before all of the panels.
<search id=”{id name}”>

The id name must be in double quotes “” and the name is case sensitive. Next, the transforming base search query is added inside of the open and closed query tags
<query> {insert query here} </query>

After the query tags, any other supported tags can be used such as the timeframe tags including tokens created and assigned in the dashboard. Then close the search tag.
</search>

Next we will add the post-process searches to each of the panels on the dashboard. The time references should be removed since the base search controls the timeframe:

Similarly to the base search, the post-process search uses the base search id in the search tags.
<search base=”{id name of base search}”>

Next would be the query tags where the post-process search goes. This query should start with a pipe “|” because it will be appended to the base search like it was all one query.
<query> “ {post-process search that start with a pipe “|” } </query>

After the query tags, any other supported tags can be used except the timeframe tags since the post-process searches go off the timeframe of the base search. Then close the search tag.
</search>

After modifying all 5 of the post-process searches in the XML source, the dashboard will be ready to use the base search. If you run the dashboard and look at the current searches, there will only be 1 search compared to 5 searches. Below is how the dashboard’s XML looks after making the changes.

<dashboard>

<label>Threat Dashboard</label>

<!– Base Search Called “baseSearch” (This can be named anything) –>

<search id=”baseSearch”>

<query>sourcetype=”pan:threat” action=allowed | stats count by app, category, rule, signature, dest_ip, src_ip</query>

<earliest>-24h@h</earliest>

<latest>now</latest>

</search>

<row>

<panel>

<table>

<title>Applications</title>

<!– post-process search 1 –>

<search base=”baseSearch”>

<query>| stats sum(count) as count by app</query>

</search>

<option name=”drilldown”>none</option>

</table>

</panel>

<panel>

<table>

<title>Rule</title>

<!– post-process search 2 –>

<search base=”baseSearch”>

<query>| stats sum(count) as count by rule</query>

</search>

<option name=”drilldown”>none</option>

</table>

</panel>

<panel>

<table>

<title>Catergory</title>

<!– post-process search 3 –>

<search base=”baseSearch”>

<query>| stats sum(count) as count by category</query>

</search>

<option name=”drilldown”>none</option>

</table>

</panel>

</row>

<row>

<panel>

<table>

<title>Signature</title>

<!– post-process search 4 –>

<search base=”baseSearch”>

<query>| stats sum(count) as count by signature</query>

</search>

<option name=”drilldown”>none</option>

</table>

</panel>

<panel>

<table>

<title>Rules by Destination IP</title>

<!– post-process search 5 –>

<search base=”baseSearch”>

<query>| stats sum(count) as count, values(rule) as rule by dest_ip</query>

</search>

<option name=”drilldown”>none</option>

</table>

</panel>

</row>

</dashboard>

The use of post-process searches in dashboards might not always work if there are no common queries. In the situations where there is commonality, then post-process searches should be used. This is not only to reduce the work load that each query requires but it reduces the likeliness off users reaching their search limits especially if the dashboard has a large number of common panels.

Want to learn more about optimizing Splunk dashboards? Contact us today!

[pardot-form id=”16215″ title=”Blog – Josh Grissom – Optimizing Splunk Dashboards with Post-process Searches”]

Step by Step Guide to Installing Splunk Insights for Infrastructure

Step by Step Guide to Installing Splunk Insights for Infrastructure

By: Pete Chen | Splunk Consultant

Overview

Since the release of Splunk Insights for Infrastructure, I’ve heard a few people tried to install it, and have had some challenges along the way. There are some pre-requisites before starting a successful installation, which will be covered in this blog. We’ll talk a little about what Insights for Infrastructure is, installing the home instance, and installing remote instances.

Before we go any further, the environment I used in my installation consisted of:

1 x Home Instance

1 virtual CPU

8 GB RAM

128 GB HD

1 x Remote Instance

1 virtual CPU

4 GB RAM

80 GB HD

Both servers are virtual, with Microsoft Hyper-V as the hypervisor. The OS used for the servers is CentOS 7 (x86_64). Using the .iso from CentOS, the servers are installed as bare minimum servers.

What is Splunk Insights for Infrastructure?

Splunk Insights for Infrastructure is a new product offering from Splunk, which aims to provide a faster and easier way to gain insight into collecting monitoring data from servers in a technical infrastructure. While traditional Splunk offers licenses based on daily ingestion rate, Splunk Insights for Infrastructure pricing is based on Storage (GB) per month. And if that wasn’t enough to get you excited, the first 200GB is FREE!

At the present time, the only operating systems supported by Splunk Insights for Infrastructure are Red Hat Enterprise Linux 6 2.6.32+

After logging in, this is the grid view of the entities being monitored. The blocks are color-coded based on health. The server being monitored in this example is healthy.

Installing Splunk Insights for Infrastructure Base

From a base installation of CentOS 7, this is a description of the steps needed to complete the task. Please keep in mind that this guide uses a .tgz installation (compared to rpm, deb, or, dmg). Using a different version of Linux may change the commands used below.

Step-by-Step

Step 1: Prepare the server for Splunk by disabling the firewall service. By default, firewalld is enabled, which may block access to port 22 (ssh), 8000 (web access), and 8089 (Splunk Admin Port). The first command listed in this step stops the firewall service. The second step turns the service off for future restarts.

Step 2: SELinux is “Security-Enhanced” Linux. While this is helpful to secure a server, it interferes with the operations of Splunk Insights for Infrastructure. To disable this, use any text editor to change the SELinux configuration file. Change the value from “SELINUX=enforcing” to “SELINUX=disabled”.

Step 3: Updating the server libraries and applications is never a bad idea. Installing updates can provide better security and add newer features and capabilities. This is not a required step.

Step 4: WGET allows a server to download an application from the web. This will help in downloading the software on the base server. On the remote servers, using a script to install monitoring services will also require WGET.

Step 5: EPEL stands for “Extra Packages for Enterprise Linux”. The additional packages don’t conflict with existing standard Linux packages, and can add more functionality to the server. CollectD is a package found in EPEL (and not in standard Enterprise Linux) and will be necessary to configure remote servers for monitoring. Since it’s helpful to monitor the performance of the base server as well, this should be installed.

Step 6: CollectD is a background process which collects, transfers, and stores performance data of the server. This data is the foundation of Splunk Insights for Infrastructure and determines the health of a server. Once CollectD is installed, the data collected will be sent to the base server via Splunk Universal Forwarder.

At this point, the prerequisite work is complete, and the server is ready to download and install Splunk Insights for Infrastructure.

Step 7: Use WGET to download the installation tar file directly to the server. The alternative is downloading the software locally, then having to find a way to transfer the installation file to the server. Using WGET makes it much simpler.

Step 8: To install Splunk, copy the installation file into the folder /opt. This will require root permissions. Once the file is copied, enter the command “tar -vxzf” followed by the file name. Tar is the application used to decompress the installation file. The subsequent letters also have value. V stands for Verbose. Z tells the application to decompress the file. X tells the application to extract the files. F tells the application a file name will be specified. Depending on how your Splunk user is set up, this may require root permissions.

Step 9: This step is a precautionary step. Changing the ownership of the Splunk folder will ensure the Splunk user can run the software without permission concerns. The R makes the change recursive, so subordinate directories will also have their permissions changed. “splunk:splunk” changes the owner of folders to the user “splunk” (first), within the group “splunk” (second). This will need to be run as root.

Step 10: This is the standard Splunk start command. The first time Splunk is run, there will be a requirement to read through and accept the software license agreement. To skip this and accept the license automatically, use “–accept-license”. This command assumes Splunk was installed in the /opt folder.

Step 11: Servers can restart for many reasons. If an application is not configured to run on start, it will have to be manually restarted after the server is back online. Running the “enable boot-start” creates an init script, which is used to start Splunk as the server is starting up.

Now, Splunk is set up, and should be accessible through a web browser by going to the site https://<hostname>:8000. When going to the URL, if a security certificate is not properly set (which it isn’t in this case), there will be a warning about the site not being secure. Advancing to the site is safe.

Installing Splunk Insights for Infrastructure Remote

Much like the installation of the base Insights for Infrastructure server, there are a few assumptions made in this document for the installation of remote services. This document will detail steps taken for a CentOS 7 server. The remote nature of the server simply means a different server, gathering its own metrics, and sending them to the base server for analysis. The prerequisite steps are the same as above.

Step by Step

Step 1: Prepare the server for Splunk by disabling the firewall service. By default, firewalld is enabled, which may block access to port 8089 (Splunk Admin Port). The first command listed in this step stops the firewall service. The second step turns the service off for future restarts.

Step 2: SELinux is “Security-Enhanced” Linux. While this is helpful to secure a server, it interferes with the operations of Splunk Insights for Infrastructure. To disable this, use any text editor to change the SELinux configuration file. Change the value from “SELINUX=enforcing” to “SELINUX=disabled”.

Step 3: Updating the server libraries and applications is never a bad idea. Installing updates can provide better security and add newer features and capabilities. This is not a required step.

Step 4: WGET allows a server to download an application from the web. This will help in downloading the software on the remote servers, using a script to install monitoring services will also require WGET.

Step 5: EPEL stands for “Extra Packages for Enterprise Linux”. The additional packages don’t conflict with existing standard Linux packages, and can add more functionality to the server. CollectD is a package found in EPEL (and not in standard Enterprise Linux) and will be necessary to configure remote servers for monitoring.

Step 6: CollectD is a background process which collects, transfers, and stores performance data of the server. This data is the foundation of Insights for Infrastructure and determines the health of a server. Once CollectD is installed, the data collected will be sent to the base server via Splunk Universal Forwarder.

Step 7: Run the installation script found in the configuration page of the Splunk Insights for Infrastructure base. In this document, the default script will be used. In a production environment, key-value pairs can be added for troubleshooting, analysis, and filtering hosts. This will need to be run as root on the remote server.

Step 8: Once the script is run, a Collectd folder will be created in /opt. Browse to /opt/collectd/etc and modify collectd.conf. By default, core server metrics should be enabled.

At this point, the remote server should start aggregating metrics into Collectd, and sent to the base server through Splunk Universal Forwarder. Within a few minutes, data should start to appear on Insights for Infrastructure.

If you have questions or need further help installing SII, please contact us today: 

[pardot-form id=”15664″ title=”Blog- Pete Chen – Step by Step Guide to Installing Splunk Insights for Infrastructure”]

TekStream Ready to Partner with Small Enterprises to Implement Splunk Insights for Infrastructure

TekStream is a Leading Implementation Provider for New Splunk Solution

    ATLANTA, GA, May 31, 2018 — TekStream Solutions, a dynamic Atlanta-based technology company specializing in digital transformation services and technical recruiting, today announced it is ready and available to partner with small enterprises to implement Splunk® Insights for Infrastructure, an analytics driven IT Operations tool for System Administrators and DevOps teams to collect, analyze, and monitor data from their on-premises, cloud or hybrid server infrastructures. TekStream is a Splunk professional services, reseller, and MSP partner with 100% of its team members being accredited Splunk architects; the company’s consultants hail from diverse backgrounds in operations, security, development, and consulting.

Splunk Insights for Infrastructure is a single download designed for individual Infrastructure teams of SysAdmins and DevOps teams who are responsible for up to 1,000 on-premises, cloud or hybrid server infrastructures. The new software solution provides a seamless experience for infrastructure monitoring and troubleshooting. Splunk Insights for Infrastructure simplifies how system administrators and DevOps teams find and fix infrastructure performance problems, enabling them to automatically correlate metrics and logs to monitor their IT environments – in an easier-to-use, more interactive, and lower-cost package.

“Our clients in the Commercial segment have different needs and, of course, smaller budgets, than large enterprises,” said Judd Robins, Executive Vice President of Sales at TekStream. “Splunk Insights for Infrastructure gives companies of any size a monitoring product they can get started with quickly, easily and for free. There is a lot of demand among our Commercial clients for a robust yet affordable monitoring solution to support their digital transformation efforts, and we are excited to be able to help them take advantage of it.”

The smallest IT environments – up to approximately 50 servers, with 200GB in total storage – are charged no licensing fee for Splunk Insights for Infrastructure. The 200GB free tier includes Community support. For tiers greater than 200GB, Base support is included in the paid license price. Splunk’s Base support includes all major and minor software updates and customer support. If the company is growing and needs to move into larger infrastructure environments, they can easily upgrade to Splunk® Enterprise, the leading analytics platform for machine data.

“TekStream offers a full array of Splunk services to companies that wish to implement Splunk Insights for Infrastructure,” said Robins. “As a Splunk partner, TekStream has the knowledge and experience to help companies every step of the way, from deciding which initial licensing option would be best for them, to implementation and training, to maintenance and support, to determining when it is time to upgrade. Our Splunk consultants also have experience and expertise integrating Splunk software with existing technologies to build a unique complementary solution. We are looking forward to helping businesses make the most of this innovative new Splunk solution and partnering with them as they grow with this use case and other use cases in the future.”

You can download Splunk Insights for Infrastructure here.

About TekStream
We are “The Experts of Business & Digital Transformation,” but more importantly, we understand the challenges facing businesses and the myriad of technology choices and skillsets required in today’s “always on” companies and markets. We help you navigate the mix of transformative enterprise platforms, talent, and processes to create future-proof solutions in preparing for tomorrow’s opportunities – so you don’t have to. TekStream’s IT consulting solutions, combined with its specialized IT recruiting expertise, helps businesses increase efficiencies, streamline costs, and remain competitive in an extremely fast-changing market. For more information about TekStream, visit www.tekstream.com or email info@tekstream.com

Data Onboarding in Splunk


Data Onboarding in Splunk

By: Joe Wohar | Splunk Consultant

Splunk is an amazing platform for analyzing any and all data in your business, however you may not be getting the best performance out of Splunk if you’re using the default settings. To get the best performance out of Splunk when ingesting data, it is important to specify as many settings as possible in a file called “props.conf” (commonly referred to as “props”). Props set ingestion settings per data sourcetype and if you do not put anything into props for your sourcetype, Splunk will automatically try to figure it out for you. While this can be a good thing when you’re first beginning with Splunk, having Splunk figure out how to parse and ingest your data affects the overall performance of Splunk. By configuring the ingestion settings manually, Splunk doesn’t have to figure out how to ingest your data. These are the 8 settings that you should set for every sourcetype in order to get the best performance:

SHOULD_LINEMERGE – As the name suggests, this settings determines whether lines from a data source file are merged or not. If your data source file contains 1 full event per line, set this to “false”; if your data source file contains multiple lines per event, set this to “true”. If you set this to “true” you’ll also need to use some other settings such as BREAK_ONLY_BEFORE or MUST_BREAK_AFTER to determine how to break the data up into events.

LINE_BREAKER – This setting divides up the data coming in based on a regular expression defining the “breaks” in the data. By default, this setting looks for new lines, however if your events are all on the same line, you’ll need create a regular expression to divide the data into lines.

TRUNCATE – TRUNCATE will split an event if it’s number of characters exceeds the value set. The default is 10000, it’s a good idea to lower this to better fit your data and it’s absolutely necessary to increase this if the events exceed 10000 characters.

TIME_PREFIX – This setting takes a regular expression for what precedes the timestamp in events so that Splunk doesn’t have to search through the event for the timestamp.

MAX_TIMESTAMP_LOOKAHEAD – This tells Splunk how far to check after the TIME_PREFIX for the full timestamp so that it doesn’t keep reading further into the event.

TIME_FORMAT – Define a timestamp in strftime format. Without this defined, Splunk has to go through its list of predefined timestamp formats to determine which one should be used.

EVENT_BREAKER – This setting should be set on the Splunk Forwarder installed on the host where the data originally resides. It takes a regular expression which defines the end of events so that only full events will be sent from the Splunk Forwarder to your indexers. EVENT_BREAKER_ENABLE – This setting merely tells Splunk to start using the EVENT_BREAKER setting. It defaults to false so if when you use EVENT_BREAKER, you’ll need to set this to “true”.

There are many other settings which can be used but as long as you have these defined, Splunk will perform much better than if it has to figure them out on its own. For more information on these settings, visit Splunk’s documentation on props: https://docs.splunk.com/Documentation/Splunk/latest/Admin/Propsconf

If you have questions, or would like more information on data onboarding best practices, please contact us: 

Data Cleansing in Splunk

Data Cleansing in Splunk

By: Zubair Rauf | Splunk Consultant

Data is the most important resource in Splunk. Having clean data ingestion is of utmost importance to drive better insights from machine data. It is eminent that data onboarding process should not be automated and every step should be carefully done as this process can determine the future performance of your Splunk environment.

When looking at the health of data in Splunk, the following metrics are important:

  • Data parsing
  • Automatically assigned sourcetypes
  • Event truncation
  • Duplicate events

Data parsing

Data parsing is the most important when it comes to monitoring data health in Splunk. This is the first step that is performed by Splunk when data is ingested into Splunk and indexed into different indexes. Data parsing includes event breaking, date and time parsing, truncation, and parsing out fields that are important to the end user to drive better insights from the data.

Splunk best practices recommend using these six parameters when defining every sourcetype to ensure proper parsing.

  • SHOULD_LINEMERGE = false
  • LINE_BREAKER
  • MAX_TIMESTAMP_LOOKAHEAD
  • TIME_FORMAT
  • TIME_PREFIX
  • TRUNCATE

When these parameters are properly defined, Splunk indexers will not have to do spend extra compute resources in trying to understand the log files it has to ingest. In my experience auditing Splunk environments, Date is one field that Splunk has to work the hardest to parse if it is it is not properly defined within the parameters of the sourcetype.

Automatically assigned sourcetypes

Sometimes when Splunk sourcetypes are not defined correctly, Splunk starts using its resources to parse events automatically and creates similar sourcetypes with a prefix number or a tag. These sourcetypes will mostly have a few events and then another one would be created.

It is important to see that such sourcetypes are not being created, as they will again contribute to data integrity being lost and search/dashboards will omit these sourcetypes as they are not part of the SPL queries that make the dashboard. I have come across such automatically assigned sourcetypes at multiple deployments. It becomes necessary to revisit and rectify the errors in sourcetype definition to prevent Splunk from doing this automatically.

Event truncation

Splunk truncates events by default when they exceed 10,000 bytes. There are some events that exceed that limit and are automatically truncated by Splunk. XML events generally exceed that limit. When an event is truncated before it ends, that harms the integrity of the data being ingested in Splunk. Such events omit complete information and therefore they have no use in driving insights and skew the overall results.

It is very important to always go back and monitor all sourcetypes for truncated events periodically so that any truncation errors can be fixed and data integrity can be maintained.

Duplicate events

Event duplication is one more important area to consider when looking at data integrity. At a recent client project, I came across almost multiple hundred gigabytes of duplication in events in an environment that was ingesting almost 10 TB of data per day. Duplication of the data can be due to multiple factors and sometimes while setting inputs, the inputs can be duplicated. Duplicate data poses a threat to the integrity of data and the insights driven from that data. Duplicate data will also take up unwanted space on the indexers.

Duplication of events should also be periodically checked, especially when new data sources are on-boarded. This is to make sure that no inputs were added multiple times. This human error can be costly. At a client where we found multiple gigabytes of duplication, 7 servers were writing their logs to one NAS drive, and then the same 7 servers were sending the same logs to Splunk. That caused duplicate events amounting to almost 100GB/day.

Ensuring that the areas mentioned above have been addressed and problems rectified, would be a good starting point towards a cleaner Splunk environment. This would help save time and money, substantially improve Splunk performance at index and search time and overall help you drive better insights from your machine data.

If you have questions, or would like assistance with cleansing and improving the quality of your Splunk data, please contact us: 

Containerization and Splunk: How Docker and Splunk Work Together

Splunk and Docker

Containerization and Splunk: How Docker and Splunk Work Together

By: Karl Cepull | Director, Operational Intelligence and Managed Services

Note: Much of the information in this blog post was also presented as a TekTalk, including a live demo of Splunk running in Docker, and how to use Splunk to ingest Docker logs. Please see the recording of the TekTalk at http://go.tekstream.com/l/54832/2017-03-30/bknxhd.

You’ve heard of Splunk, and maybe used it. You’ve heard of Docker, and maybe used it, too. But have you tried using them together? It can be a powerful combination when you do!
But first, let’s review what Splunk and Docker are, just to set a baseline. Also, learn more about TekStream’s Splunk Services.

What is Splunk?

Splunk is the industry-leading solution for turning “digital exhaust” into business value. “Digital exhaust” refers to the almost unlimited amount of data being output by just about every digital device in the world today, such as application and web servers, databases, security and access devices, networking equipment, and even your mobile devices.

Usually, this data is in the form of log files. And, due to the volume being produced, it usually just sits on a hard drive somewhere until it expires and is deleted. It is only looked at when something goes wrong, and requires a lot of digging and searching to find anything useful.

Splunk changes all of that. It ingests those log files in near-real-time, and provides a “Google-like” search interface, making it extremely easy to search large amounts of data quickly. It also can correlate the information in myriad log files, allowing for an easier analysis of the bigger picture. Finally, it has a plethora of visualization and alerting options, allowing you to create rich reports and dashboards to view the information, and generate various alerts when specific conditions are met.

What is Docker?

Docker is also an industry leader. It is a container manager, that allows you to run multiple applications (or containers) side by side in an isolated manner, but without the overhead of creating multiple virtual machines (VMs) to do so. Containers give you the ability to “build once, run anywhere,” as Docker containers are designed to run on any host that can run Docker. Docker containers can also be distributed as whole “images,” making it easy to deploy applications and microservices.

Why use Splunk and Docker Together?

While there are many ways that you can use Splunk and Docker, there are two main configurations that we will address.

Using Docker to run Splunk in a Container

Running Splunk as a container in Docker has a lot of advantages. You can create an image that has Splunk pre-configured, which makes it easy to fire up an instance for testing, a proof-of-concept, or other needs. In fact, Splunk even has pre-configured images of Splunk Enterprise and the Universal Forwarder available in the Docker Hub for you to download!

Using Splunk to Monitor a Docker Container

In this configuration, one or more Docker containers are configured to send their logs and other operational information to a Splunk instance (which can also be running in another container, if desired!). Splunk has a free app for Docker that provides out-of-the-box dashboards and reports that show a variety of useful information about the events and health of the Docker containers being monitored, which provides value without having to customize a thing. If you are also using Splunk to ingest log information from the applications and services running inside of the containers, you can then correlate that information with that from the container itself to provide even more visibility and value.

Our Demo Environment

To showcase both of the above use cases, Splunk has a repository in GitHub that was used at their .conf2016 event in September of 2016. You can download and use the instructions to create a set of Docker containers that demonstrate both running Splunk in a container, as well as using Splunk to monitor a Docker container.

If you download and follow their instructions, what you build and run ends up looking like the following:

splunk 1

There are 5 containers that are built as part of the demo. The ‘wordpress’ and ‘wordpress_db’ containers are sample applications that you might typically run in Docker, and are instances of publicly-available images from the Docker Hub. Splunk Enterprise is running in a container as well, as is an instance of the Splunk Universal Forwarder. Finally, the container named “my_app” is running a custom app that provides a simple web page, and also generates some fake log data so there is something in Splunk to search.

By using a shared Volume (think of it as a shared drive) that the WordPress database logs are stored on, the Splunk Universal Forwarder is used to ingest the logs on that volume using a normal “monitor” input. This shows one way to ingest logs without having to install the UF on the container with the app.

The HTTP Event Collector (HEC) is also running on the ‘splunk’ container, and is used to receive events generated by the ‘my_app’ application. This show another way to ingest logs without using a UF.

Finally, HEC is also used to ingest events about the ‘wordpress’ and ‘wordpress_db’ containers themselves.

If you would like to see a demo of the above in action, please take a look at the recording of our TekTalk, which is available at http://go.tekstream.com/l/54832/2017-03-30/bknxhd

Here is a screenshot of one of the dashboards in the Docker app, showing statistics about the running containers, to whet your appetite.

splunk22

How Does it Work?

Running Splunk in a Container

Running Splunk in a container is actually fairly easy! As mentioned above, Splunk has pre-configured images available for you to download from the Docker Hub (a public repository of Docker images).

There are 4 images of interest – two for Splunk Enterprise (a full installation of Splunk that can be used as an indexer, search head, etc.), and two for the Universal Forwarder. For each type of Splunk (Enterprise vs Universal Forwarder), there is an image that just has the base code, and an image that also contains the Docker app.

Here’s a table showing the image name and details about each one:

Image NameDescription
splunk/splunk:6.5.2The base installation of Splunk Enterprise v6.5.2 (the current version available as of this writing).
splunk/splunk:6.5.2-monitor splunk/splunk:latest The base installation of Splunk Enterprise v6.5.2, with the Docker app also installed.
splunk/universalforwarder:6.5.2 splunk/universalforwarder:latestThe base installation of the Splunk Universal Forwarder, v6.5.2.
splunk/universalforwarder:6.5.2-monitorThe base installation of the Splunk Universal Forwarder v6.5.2, with the Docker add-in also installed.

Get the image(s):

  1. If you haven’t already, download a copy of Docker and install it on your system, and make sure it is running.
  2. Next, create an account at the Docker Hub – you’ll need that in a bit.
  3. From a command shell, log in to the Docker Hub using the account you created in step 2, using the following command:

    docker login
  4. Now, download the appropriate image (from the list above) using the following command:
    docker pull <imagename>
    

Start the Container:

To run Splunk Enterprise in a Docker container, use the following command:

docker run –d \
   --name splunk
   -e “SPLUNK_START_ARGS=--accept-license” \
   -e “SPLUNK_USER=root” \
   –p “8000:8000” \
   splunk/splunk

To run the Universal Forwarder in a Docker container, use the following command:

docker run –d \
  --name splunkuniversalforwarder \
  --env SPLUNK_START_ARGS=--accept-license \
  --env SPLUNK_FORWARD_SERVER=splunk_ip:9997 \
  --env SPLUNK_USER=root \
  splunk/universalforwarder

In both cases, the “docker run” command tells Docker to create and run an instance of a given image (the “splunk/splunk” image in this case). The “-d” parameter tells it to run it as a “daemon” (meaning in the background). The “-e” (or “–env”) parameters set various environment variables that are passed to the application in the container (more below), and the “-p” parameter tells Docker to map the host port 8000 to port 8000 in the container. (This is so we can go to http://localhost:8000 on the host machine to get to the Splunk web interface.)

So, what are those “-e” values? Below is a table showing the various environment variables that can be passed to the Splunk image, and what they do. If a variable only applies to Splunk Enterprise, it is noted.

Environment VariableDescriptionHow Used
SPLUNK_USERUser to run Splunk as. Defaults to ‘root’.
SPLUNK_BEFORE_START_CMD
SPLUNK_BEFORE_START_CMD_n
Splunk command(s) to execute prior to starting Splunk. ‘n’ is 1 to 30. Non-suffixed command executed first, followed by suffixed commands in order (no breaks)../bin/splunk <SPLUNK_BEFORE_START_CMD[_n]>
SPLUNK_START_ARGSArguments to the Splunk ‘start’ command./bin/splunk start <SPLUNK_START_ARGS>
SPLUNK_ENABLE_DEPLOY_SERVERIf ‘true’, will enable the deployment server function. (Splunk Enterprise only.)
SPLUNK_DEPLOYMENT_SERVERDeployment server to point this instance to./bin/splunk set deploy-poll <SPLUNK_DEPLOYMENT_SERVER>
SPLUNK_ENABLE_LISTEN
SPLUNK_ENABLE_LISTEN_ARGS
The port and optional arguments to enable Splunk to listen on. (Splunk Enterprise only)./bin/splunk enable listen <SPLUNK_ENABLE_LISTEN> <SPLUNK_ENABLE_LISTEN_ARGS>
SPLUNK_FORWARD_SERVER
SPLUNK_FORWARD_SERVER_n
SPLUNK_FORWARD_SERVER_ARGS
SPLUNK_FORWARD_SERVER_ARGS_n
One or more Splunk servers to forward events to, with optional arguments. ‘n’ is 1 to 10../bin/splunk add forward-server <SPLUNK_FORWARD_SERVER[_n]> <SPLUNK_FORWARD_SERVER_ARGS[_n]>
SPLUNK_ADD
SPLUNK_ADD_n
Any monitors to set up. ‘n’ is 1 to 30../bin/splunk add <SPLUNK_ADD[_n]>
SPLUNK_CMD
SPLUNK_CMD_n
Any additional Splunk commands to run after it is started. ‘n’ is 1 to 30../bin/splunk <SPLUNK_CMD[_n]>

 

Splunking a Docker Container

There are 2 main parts to setting up your environment to Splunk a Docker container. First, we need to set up Splunk to listen for events using the HTTP Event Collector. Second, we need to tell Docker to send its container logs and events to Splunk.

Setting up the HTTP Event Collector

The HTTP Event Collector (HEC) is a listener in Splunk that provides for an HTTP(S)-based URL that any process or application can POST an event to. (For more information, see our upcoming TekTalk and blog post on the HTTP Event Collector coming in June 2017.) To enable and configure HEC, do the following:

  1. From the Splunk web UI on the Splunk instance you want HEC to listen on, go to Settings | Data inputs | HTTP Event Collector.
  2. In the top right corner, click the Global Settings button to display the Edit Global Settings dialog. splunk img 33Usually, these settings do not need to be changed. However, this is where you can set what the default sourcetype and index are for events, whether to forward events to another Splunk instance (e.g. if you were running HEC on a “heavy forwarder”), and the port to listen on (default of 8088 using SSL). Click Save when done.
  3. Next, we need to create a token. Any application connecting to HEC to deliver an event must pass a valid token to the HEC listener. This token not only authenticates the sender as valid, but also ties it to settings, such as the sourcetype and index to use for the event. Click the New Token to bring up the wizard.
  4. On the Select Source panel of the wizard, give the token a name and optional description. If desired, specify the default source name to use if not specified in an event. You can also set a specific output group to forward events to. Click Next when done. splunk 4
  5. On the Input Settings panel, you can select (or create) a default sourcetype to use for events that don’t specify one. Perhaps one of the most important options is on this screen – selecting a list of allowed indexes. If specified, events using this token can only be written to one of the listed events. If an index is specified in an event that is not on this list, it is dropped. You can also set a default index to use if none are specified in an individual event. splunk 5
  6. Click Review when done with the Input Settings panel. Review your choices, then click Submit when done to create the token.
  7. The generated token value will then be shown to you. You will use this later when configuring the output destination for the Docker containers. (You can find this value later in the list of HTTP Event Collector tokens.)

Configuring Docker to send to Splunk

Now that Splunk is set up to receive event information using HEC, let’s see how to tell Docker to send data to Splunk. You do this by telling Docker to use the ‘splunk’ logging driver, which is built-in to Docker starting with version 1.10. You pass required and optional “log-opt” name/value pairs to provide additional information to Docker to tell it how to connect to Splunk.

The various “log-opt” values for the Splunk logging driver are:

‘log-opt’ ArgumentRequired?Description
splunk-tokenYesSplunk HTTP Event Collector token
splunk-urlYesURL and port to HTTP Event Collector, e.g.: https://your.splunkserver.com:8088
splunk-sourceNoSource name to use for all events
splunk-sourcetypeNoSourcetype of events
splunk-indexNoIndex for events
splunk-formatNoMessage format. One of “inline”, “json”, or “raw”. Defaults to “inline”.
labels / envNoDocker container labels and/or environment variables to include with the event.

In addition to the above “log-opt” variables, there are environment variables you can set to control advanced settings of the Splunk logging driver. See the Splunk logging driver page on the Docker Docs site for more information.

Splunking Every Container

To tell Docker to send the container information for all containers, specify the Splunk logging driver and log-opts when you start up the Docker daemon. This can be done in a variety of ways, but below are two common ones.

  1. If you start Docker from the command-line using the ‘dockerd’ command, specify the “–log-driver=splunk” option, like this:

    dockerd --log-driver=splunk \
      --log-opt splunk-token=4222EA8B-D060-4FEE-8B00-40C545760B64 \
      --log-opt splunk-url=https://localhost:8088 \
      --log-opt splunk-format=json
  2. If you use a GUI to start Docker, or don’t want to have to remember to specify the log-driver and log-opt values, you can create (or edit) the daemon.json configuration file for Docker. (See the Docker docs for information on where this file is located for your environment.) A sample daemon.json looks like this:
       {
         “log-driver”:”splunk”,
         “log-opts”:{
              “splunk-token”:”4222EA8B-D060-4FEE-8B00-40C545760B64”
              “splunk-url”:”https://localhost:8088”,
               “splunk-format”:”json”
              } 
        }

Either of the above options will tell Docker to send the container information for ALL containers to the specified Splunk server on the localhost port 8088 over https, using the HEC token that you created above. In addition, we have also overridden the default event format of “inline”, telling Docker to instead send the events in JSON format, if possible.

Splunk a Specific Docker Container

Instead of sending the container events for ALL containers to Splunk, you can also tell Docker to just send the container events for the containers you want. This is done by specifying the log-driver and log-opt values as parameters to the “docker run” command. An example is below.

docker run --log-driver=splunk \
  --log-opt splunk-token=176FCEBF-4CF5-4EDF-91BC-703796522D20 \
  --log-opt splunk-url=https://splunkhost:8088 \
  --log-opt splunk-capath=/path/to/cert/cacert.pem \
  --log-opt splunk-caname=SplunkServerDefaultCert \
  --log-opt tag="{{.Name}}/{{.FullID}}" \
  --log-opt labels=location \
  --log-opt env=TEST \
  --env "TEST=false" \
  --label location=west \
  your/application

The above example shows how to set and pass environment variables (“TEST”) and/or container labels (“location”), on each event sent to Splunk. It also shows how you can use the Docker template markup language to set a tag on each event with the container name and the container ID.

Hints and Tips

Running Splunk in a Container

  • As of this writing, running Splunk in a Docker container has not been certified, and is unsupported. That doesn’t mean you can’t get support, just that if the problem is found to be related to running Splunk in the container, you may be on your own. However, Splunk has plans to support running in a container in the near future, so stay tuned!
  • One of the advantages to running things in containers is that the containers can be started and stopped quickly and easily, and this can be leveraged to provide scalability by starting up more instances of an image when needed, and shutting them down when load subsides.
    A Splunk environment, however, is not really suited for this type of activity, at least not in a production setup. For examples, spinning up or shutting down an additional indexer due to load isn’t easy – it needs to be part of a cluster, and clusters don’t like their members to be going up and down.
  • Whether running natively, in a VM, or in a container, Splunk has certain minimum resource needs (e.g. CPU, memory, etc.). By default, when running in a container, these resources are shared by all containers. It is possible to specify the maximum amounts of CPU and memory a container can use, but not the minimum, so you could end up starving your Splunk containers.

Splunking a Docker Container

  • Definitely use the Docker app from Splunk! This provides out-of-the-box dashboards and reports that you can take advantage of immediately. (Hint: use the “splunk/splunk:6.5.2-monitor” image.)
  • Use labels, tags, and environment variables passed with events to enhance the event itself. This will allow you to perform searches that filter on these values.
  • Note that some scheduling tools for containers don’t have the ability to specify a log-driver or log-opts. There are workarounds for this, however.

Additional Resources

Below is a list of some web pages that I’ve found valuable when using Splunk and Docker together.

Happy Splunking!

Have more questions? Contact us today!

Silhouette of business people moving and joining pieces of jigsaw puzzle on a business and stock market blue background, with charts, diagrams, world maps and data arranged on grid and tables. World maps showing continents and countries with economy data and growth diagrams. Light glowing from the center. Copy space on bottom.

Silhouette of business people moving and joining pieces of jigsaw puzzle on a business and stock market blue background, with charts, diagrams, world maps and data arranged on grid and tables. World maps showing continents and countries with economy data and growth diagrams. Light glowing from the center. Copy space on bottom.