Query AWS Resources with a Custom Search Command in Splunk

By: Bruce Johnson  | Director, Enterprise Security


Ever wondered what to make of all those resources living in your Cloud environment? Perhaps you’ve asked yourself, “What if I could leverage data from my cloud environment to enhance day-to-day Splunk operations?” The purpose of this blog is to show you how to quickly create a Splunk custom search command that will allow you to query AWS resources on the fly. New capabilities, eh? Well, maybe not so new. Custom search commands have been supported by Splunk for many years now; we’ll simply be shedding some light on how you can easily enhance the functionality of your Splunk environment to meet your business needs.

Splunk custom search commands are “commands that extend SPL to serve your specific needs.” Custom search commands can be Generating, Streaming, Transforming, and Dataset processing, each type serving a different purpose and functionality. For example, Generating search commands fetch data from a given location without performing any additional data processing or transformation. With a Splunk Generating search command, data can be natively fetched from a Splunk index or imported from any external source such as an API. In this blog, we’ll install a custom Splunk Generating Command that interacts with the AWS API to retrieve information about real-time compute and network resources deployed in a cloud environment.

Other types of Splunk search commands:

  • Streaming: process search results one-by-one, applying one transformation to each event that a search returns. A commonly used Splunk streaming command is the eval command.
  • Dataset processing: require the entire dataset in place before the command can run. For example, the sort command is a Dataset processing command that sorts the returned results based on the defined search fields.
  • Transforming: order search results into a data table. For example, chart, top, stats, time chart.

This blog walks you through the steps of installing a custom Splunk Generating command that allow you to query real-time resource information from your AWS cloud environments such as EC2 Instances, EBS Volumes, Security Groups, Network Interfaces, Subnets, and VPCs. This custom Splunk Generating command uses the Splunk SDK to interact with the AWS API and imports data into Splunk for further event processing and correlation. This custom Splunk Generating command can be used to:

  • – Fetch information about existing AWS resources
  • – Create inventory lookups for AWS resources
  • – Correlate external data with Splunk indexed data


  • – AWS account
  • – Splunk Search Head hosted on an EC2 Instance
  • – AWS IAM Role
  • – AWS IAM Policy
  • – Custom Search Command TA

To keep the blog simple, we’ll assume that our readers have a Splunk installation already launched in AWS. This blog will walk through the process of creating the necessary AWS role and policy to ensure the EC2 instance has the required permissions to query AWS resources. Once that’s taken care of, we’ll install the custom search command TA onto the Splunk Search Head.

Create AWS IAM Policy and Role:

  1. Log in to your AWS account via the AWS Management Console.
  2. Go to the IAM service.
  3. Create an IAM Policy with the following AWS permissions.
  4. Create an IAM Role that inherits the IAM policy you created in step three.
  5. Attach the IAM role created to your Splunk Search Head.

Install Custom Search Command on Splunk Search Head

  1. Initialize an SSH session to the Splunk Search Head.
  2. Clone the Custom Search Command repository from GitHub to $SPLUNK_HOME/etc/apps: git clone
  3. Restart Splunk: $SPLUNK_HOME/bin/splunk restart
  4. Search away!

EC2 Instances:



Security Groups:

EBS Volumes:

Network Interfaces:

TA Components:

  • – aws_resources.py: Defines the AWS Client and required functions to pull AWS resources
  • – awssearch.py: Defines the Splunk Generating Command used to pull AWS Resources
  • – splunklib: Splunk SDK Python modules
  • – commands.conf: Defines the “awssearch” SPL command to be used
  • – searchbng.conf: Defines search-assistant configurations for the “awssearch” SPL command

Learn more about Splunk custom search commands. You can find the source code for the Splunk AWS Inventory TA on GitHub.

Contact us for more help on using custom search commands in Splunk!

Maximize WorkDocs Using AWS Lambda Functions

How AWS Lambda Functions Can Be Used to Automate and Advance WorkDocs Document Repository

  By: Courtney Dooley | Technical Architect


Amazon Web Services WorkDocs is a simple document repository that has a user-friendly interface and search capabilities. A key element of WorkDocs is its API library and the ability to extend and automate functionality from most third-party process workflows. Below are three ways Lambda functions can use the WorkDocs API library to automate functionality for Folder Automation, Permissions Management, and Comments & Labels.

Folder Automation

Custom REST APIs can be created using Lambda functions triggered by APIs created in AWS API Gateway. These functions can complete folder and file modifications within WorkDocs. The API calls can create a folder in a specific existing folder or user root folder based on data provided by the API request. Even if data isn’t entirely known, such as the parent folder id, providing an owner-user and folder name can get the parent folder id using the describeUsers and describeFolderContents APIs.

If a folder or file needs to be renamed, it can be updated using the WorkDocs updateDocument and updateFolder APIs. The same functions can be used to move a file or folder from one location to another by specifying an alternative “ParentFolderId” parameter.

Finally, when content is ready for cleanup, a file or folder can be deleted using the deleteFolder, deleteDocument, and deleteFolderContents APIs. These functions move the content to the owner’s recycle bin where it can be restored or permanently deleted by the owner. The retention time limit for all recycle bins can be set via the administration settings within WorkDocs.

Permissions Management

Using the describeUsers WorkDocs API, a specific user can be identified by specifying the user’s login as the “Query” parameter. If the user has not been activated for WorkDocs, that activation can be done using the activateUser WorkDocs API. Once active, that user’s id can be added as an owner, co-owner, contributor, or viewer of any resource by using the addResourcePermissions API. All calls can be completed in the same Lambda function and requested specifying all details at once.

Specific users can be removed from a file or folder by using the removeResourcePermissions, or all permissions can be removed using the removeAllResourcePermissions API. The remove all can be used to start over and add new permissions to the resource as part of a custom “Replace All” functionality.

Users can also be notified of these changes by setting the “Notifications” JSON parameter in the addResourcePermissions. By setting the “EmailMessage” value and “SetEmail” to true within the “Notifications,” an email alert will be sent to the user informing them of the permissions change. Setting “SetEmail” value to false will keep the notification from being sent and the user will not know of the change other than seeing the resource in the “Shared with me” view.

Users can also be deactivated, deleted, created, and modified using the WorkDocs API library to allow for a full range of provisioning via Lambda functions.

Comments & Labels

Although WorkDocs does not have metadata in the traditional sense (at least not at the time of this blog post), labels and comments can be searched, which gives a unique ability to values set using those features.

Comments are available as feedback when viewing documents and are not available on folders, however, labels are not exposed to the user interface at this time and can be set on folders and documents independently.

Lambda functions can be used to get, set, and delete comments or labels on a resource using the below set of API functions:

  • – createComment
  • – deleteComment
  • – describeComments
  • – createLabels
  • – deleteLabels

describeFolder and describeDocument include the labels assigned to that resource. deleteLabels allows for a parameter to “DeleteAll” rather than just a specific label.

All APIs mentioned in this blog are available out of the box with WorkDocs and can be combined to create a custom solution that meets the demands of custom web services, process workflows, and other integrated applications.

AWS offers a robust and expansive set of services to implement custom solutions quickly and easily.

Contact us for more tips and tricks on developing AWS Cloud Solutions!

Don’t Be a Karen: Rebuilding the Terraform State File and Best Practices for Backend State File Storage

  By: Brandon Prasnicki  |  Technical Architect


It happened. It finally happened. After talking to the manager, Contractor Karen quit. She was solely responsible for managing the project’s cloud architecture with Terraform. Now that Karen left, a new resource needs to take her place and continue managing and building the cloud infrastructure. Luckily, the terraform code was in a git repository (excluding the .terraform dir), but no one is sure if it is up to date, and the state file was local to Karen’s machine and not recoverable. What to do now?

  1. Don’t be a Karen. Make it a company policy to configure the backend. A Terraform backend is the configuration on how (and where) to store your Terraform state in a centralized, remote location.
    • – A shared resource account or a production account is a good place to store terraform states.
    • – Having a remote backend is also a must for shared development environments.
  2. Use a versioned bucket. State files can get corrupt, and you may need to revert to an old version of the state file.
  3. Configure the backend. For each unique terraform state, make sure to update the key path to be reflective of the workload architecture the state file is associated with:

If it’s already too late, and you have been victimized by a Karen, then it’s time to rebuild the state file.

  1. Depending on the size of your workload, this will be a time-consuming process.
  2. For each resource, you will need to identify the key needed to import into the state. For this key, reference the terraform documentation. For example:
    • a. For a VPC you would reference this page and see that to import a VPC you would need the VPC ID:
      terraform import aws_vpc.test_vpc vpc-a01106c2
    • b. For an EC2 instance you would reference this page and see that to import an EC2 instance you would need the EC2 instance ID:
      terraform import aws_instance.web i-12345678
  3. After each import, you should run a plan and make sure the plan does not expect any changes you are not anticipating and correct them in the code if applicable. This process will take time.

Contact us for more help on rebuilding Terraform State Files!

Accessing or Restricting Website Content by Geolocation

      By: Stuart Arnett  |  Principal Architect

A public facing website in general is accessible to the entire world. However, there might be times when you want to allow or restrict content based on the geolocation of the user accessing the site. There are several reasons for wanting to do this, but the following are some examples:

  • – Security – restrict access to the website based on country (ex., only United States users can access the website or all countries except China can access the website)
  • – Translations – automatically display the website using the primary language for the country the user is located in
  • – Targeted Content – display different content to the end user, (ex., redirect the user to country specific domains of the website or provide access to different pages)

To perform any of the above scenarios, the web server will need to know the geolocation of the user. This is accomplished by using the user’s IP address. This poses a problem because the location for an IP addresses can change when IP addresses are re-assigned. Luckily for us, MaxMind, Inc. (https://www.maxmind.com) provides GeoIP databases and services. The GeoIP databases / services are available in both free and paid for versions. The free versions are not as accurate as the paid for versions and only provide country, city, and ASN databases. We will be focusing on the free version of the GeoIP databases.

MaxMind also provides tools to keep the databases updated and a module for Apache HTTP server to query the databases to determine the user’s country, city, and/or ASN data. We will be setting up both the GeoIP database update tool and configuring Apache HTTP server module in the demo.

Demo Environment

We will be using the following for setting up the demo:

  • Operating System: Amazon Linux 2
  • Web Server: Apache HTTP Server 2.4
  • Geolocation: MaxMind GeoLite2 Databases, MaxMind GeoIP Update, and MaxMind DB Apache Module

MaxMind Account Setup

MaxMind requires you to setup an account and create a license key to access the MaxMind GeoLite2 databases.

  1. Navigate to https://www.maxmind.com/en/geolite2/signup and setup a new account.
  2. Once your account is set up, navigate to https://www.maxmind.com/en/my_license_key to create a new license key for GeoIP Update versions older than 3.1.1.

Note: Make sure to copy the license key, because it is not accessible after you leave the creation page.

Install and Configure MaxMind

We will now install MaxMind databases, update application, and the Apache HTTP module on Apache Linux 2.

  1. Log into your Apache Linux 2 instance. You will need an account with root access.
  2. Enable the EPEL repository on Amazon Linux 2 to access the MaxMind packages. sudo yum install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
  3. Install MaxMind databases and development libraries using the following command:
    sudo yum install -y libmaxminddb-devel
  4. Amazon Linux 2 already includes an older version of the MaxMind GeoIP Update package (GeoIP) which will work but does not meet MaxMind’s latest security requirements. A request has been added to the AWS update backlog to update the package. The installed version will be used for this demo, but the configuration below will change when Amazon upgrades to the newer version.
  5. Update the MaxMind GeoIP Update configuration in /etc/GeoIP.conf
    sudo vi /etc/GeoIP.conf
    Update the UserId and LicenseKey information with your MaxMind Account ID and License Key created during your MaxMind account setup above. Update the ProductIds with the values specified below (GeoLite2-ASN GeoLite2-City GeoLite2-Country). The Product IDs are used to determine which databases will be updated. # Enter your license key here
    # customers should insert their license key and user_id
    # free GeoLite users should use 000000000000 as license key
    LicenseKey 000000000000
    # Enter your User ID here ( GeoLite only users should use 999999 as user_id )
    UserId 999999# Enter the Product ID(s) of the database(s) you would like to update
    # By default 106 (MaxMind GeoIP Country) is listed below
    ProductIds GeoLite2-ASN GeoLite2-City GeoLite2-Country
  6. Update the MaxMind GeoIP Databases.
    /usr/bin/geoipupdate -v The -v command line option is used for verbose mode to provide some output when running the update. For future reference, the database files are stored in the “/usr/share/GeoIP/” folder.
  7. MaxMind updates the GeoIP databases on Tuesdays US Eastern time. Configure a crontab job to execute the geoipupdate program on Wednesdays to keep the databases updated with the latest information. crontab -e
    Add the following line to run the job on Wednesdays at 4am UTC.

    Adjust the time accordingly based on the time zone for your server.

    00 04 * * 3 sudo /usr/bin/geoipupdate
  8. Install the MaxMind Apache HTTP Module using the following commands: sudo yum install -y gcc
    sudo yum install -y httpd-devel
    mkdir ~/download/
    cd ~/download
    wget https://github.com/maxmind/mod_maxminddb/releases/download/1.2.0/mod_maxminddb-1.2.0.tar.gz
    tar xvzf mod_maxminddb-1.2.0.tar.gz

    cd mod_maxminddb-1.2.0
    sudo make install

    The above commands install the gcc and httpd-devel packages which are required to build the mod_maxminddb Apache HTTP module. The module tarball is downloaded, built, and installed which adds the library to the Apache HTTP modules folder and updates the httpd.conf including the new library.

Install and Configure Apache HTTP Server

We now need to setup and configure Apache HTTP Server.

  1. Log into your Apache Linux 2 instance. You will need an account with root access.
  2. Install Apache HTTP Server using the following command:
    sudo yum install -y httpd
  3. Start the Apache HTTP Server and verify it is working by navigating to your server with a web browser. Make sure the Apache Test page is displayed.
    sudo systemctl start httpd
  4. Create a new httpd configuration file with the MaxMind configuration:
    sudo vi /etc/httpd/conf.d/geolocation.confAdd the following to the file:

    <VirtualHost *:80>
    <IfModule mod_maxminddb.c>

    MaxMindDBEnable On
    MaxMindDBFile ASN_DB /usr/share/GeoIP/GeoLite2-ASN.mmdb
    MaxMindDBFile CITY_DB /usr/share/GeoIP/GeoLite2-City.mmdb
    MaxMindDBFile COUNTRY_DB /usr/share/GeoIP/GeoLite2-Country.mmdb
    MaxMindDBEnv GEOIP_ASN ASN_DB/autonomous_system_number
    MaxMindDBEnv GEOIP_ASORG ASN_DB/autonomous_system_organization# MaxMindDBEnv GEOIP_CONTINENT_CODE CITY_DB/continent/code
    # MaxMindDBEnv GEOIP_CONTINENT_NAME CITY_DB/continent/names/en
    # MaxMindDBEnv GEOIP_COUNTRY_CODE CITY_DB/country/iso_code
    # MaxMindDBEnv GEOIP_COUNTRY_NAME CITY_DB/country/names/en
    MaxMindDBEnv GEOIP_CITY_NAME CITY_DB/city/names/en
    MaxMindDBEnv GEOIP_LONGITUDE CITY_DB/location/longitude
    MaxMindDBEnv GEOIP_LATITUDE CITY_DB/location/latitude
    MaxMindDBEnv GEOIP_CONTINENT_NAME COUNTRY_DB/continent/names/en
    MaxMindDBEnv GEOIP_COUNTRY_CODE COUNTRY_DB/country/iso_code
    MaxMindDBEnv GEOIP_COUNTRY_NAME COUNTRY_DB/country/names/en
    RewriteEngine OnRewriteCond %{REQUEST_URI} !^/geoip.html$
    RewriteCond %{ENV:GEOIP_COUNTRY_CODE} ^US$
    RewriteRule .* /geoip.html?continentCode=%{ENV:GEOIP_CONTINENT_CODE}&continent=%

    • – MaxMindDBEnable is used to enable or disable the MaxMind DB Apache Module.

      MaxMindDBEnable On|Off
    • – MaxMindDBFile is used to associate a name to a MaxMind database file on disk.

      MaxMindDBFile <Name> <Filesystem Path to Database File>
    • – MaxMindDBEnv is used to assign a database value to an environment variable. It will look up the database value based on the IP address making the current request.

      MaxMindDBEnv <ENV Variable Name> <Database Name><Path to Data>
    • – Environment variable values are accessed using the following syntax:

      %{ENV:<ENV Variable Name>}
      An Apache rewrite condition (RewriteCond %{ENV:GEOIP_COUNTRY_CODE} ^US$) is used in the above configuration to redirect all United States users accessing the website to the geoip.html page. The query parameters appended to the URL are used to display the additional data that is available within the MaxMind GeoLite2 database files and is for reference only.Note: If you are located outside of the United States, please update the country code accordingly to be able to verify the solution is working when you test.
  5. Create HTML file geoip.html

    sudo vi /var/www/html/geoip.html
    Add the following to the file:<h1>Geolocation Test Page</h1>
  6. Restart the Apache HTTP Server to load the new configuration.

    sudo systemctl restart httpd
  7. Navigate to the root page of your website. Assuming your IP is within the United States, you will be redirected to the geoip.html page. Review the query parameters added to the URL to view the additional data that is available within the MaxMind GeoLite2 database files based on your IP address.

You should now have a working website which allows geolocation-based rules to be applied to direct traffic. You can use this as a base and modify the configuration adding any additional rules you require for the geolocation business requirements in your website.

Want to learn more about geolocation rules and exclusions? Contact us today!

Share and Share Alike: Using a Data Model Job Server for Shared Data Model Accelerations

      By: Jon Walthour  |  Senior Splunk Consultant, Team Lead


  • – Common Information Model (CIM)
  • – DBX Health Dashboards
  • – Palo Alto app
  • – Splunk Global Monitoring Console
  • – Infosec
  • – CIM Validator
  • – CIM Usage Dashboards
  • – ArcSight CEF data models add-on
  • – SA-Investigator
  • – Threat hunting

All these Splunk apps and add-ons and many others use data models to power their searches. In order for a data model-powered search to function at peak performance, they are often accelerated. This means that at regular, frequent intervals, the searches that define these data models are run by Splunk and the results are summarized and stored on the indexers. And, because of the design of data models and data model accelerations, this summarized data stored on the indexers is tied to the search head or search head cluster that created it.

So, imagine it: You’re employing many different apps and add-ons in your Splunk deployment that all require these data models. Many times you need the same data models accelerated on several different search heads for different purposes. All these data models on all these search heads running search jobs to maintain and keep their summarized data current. All this summarized data is stored again and again on the indexers, each copy of a bucket’s summary data identical, but tied to a different search head.

In a large distributed deployment with separate search heads or search head clusters for Enterprise Security, IT Service Intelligence, adhoc searching, etc., you end up accelerating these data models everywhere you want to use them—on each search head or search head cluster, on your Monitoring Console instance, on one or more of your heavy forwarders running DB Connect, and more. That’s a lot of duplicate searches consuming CPU and memory on both your search heads and your indexers and duplicate accelerated data-consuming storage on those indexers.

There is a better way, though. Beginning with version 8.0, you can now share data models across instances—run once, use everywhere in your deployment that uses the same indexers. You accelerate the data models as usual on Search Head 1. Then, on Search Head 2, you direct Splunk to use the accelerated data created by the searches run on Search Head 1. You do this in datamodel.conf on Search Head 2 under the stanzas for each of the data models you want to share by adding the setting “acceleration.source_guid” like this:

[<data model name.]
acceleration.source_guid = guid of Search Head 1

You get the GUID from one of two places. If a standalone search head created the accelerated data, the GUID is in $SPLUNK_HOME/etc/instance.cfg. If the accelerated data was created by data model searches run on a search head cluster, you will find the GUID for the cluster in server.conf on any cluster member in the [shclustering] stanza.

That’s it, but there are a few “gotchas” to keep in mind.

First, keep in mind that everything in Splunk exists in the context of an app, also known as a namespace. So, the data models you’re accelerating are defined in the context of an app. Thus, the datamodel.conf you’re going to have on the other search heads with the “acceleration.source_guid” setting must be defined in the same namespace (the same app) as the one in which the data model accelerations are generated on the originating search head.

Second, once you set up this sharing, you cannot edit the data models on the search heads sharing the accelerated data (Search Head 2, in our example above) via Splunk web. You have to set up this sharing via the command line, and you can only edit it via the command line. You will also not be able to rebuild the accelerated data on the sharing search heads for obvious reasons, as they did not build the accelerated data in the first place.

Third, as with all other things in multisite indexer clusters, sharing data model accelerations in multisite indexer clusters gets more complicated. Basically, since the summary data hitches a ride with the primary buckets in a multisite deployment, which end up being spread across the sites, while search heads get “assigned” to particular sites, you want to set “summary_replication” to “true” in the [clustering] stanza in server.conf. This ensures that every searchable copy of a bucket, not just the primary bucket, has a copy of the accelerated data and that searches of summary data are complete. There are other ways to deal with this issue, but I’ve found simply replicating the accelerated data to all searchable copies ensures no missing data and no duplicate data the best.

Finally, when you’re running a tstats search against a shared data model, always use summariesonly=true. Again, this ensures a consistent view of the data as unsummarized data could introduce differing sources and thus incorrect results. One way to address this is to ensure the definition of the indexes that comprise the sources for the data models in the CIM (Common Information Model) add-on are consistent across all the search heads and search head clusters.

And this leads us to the pièce de résistance, the way to take this feature to a whole new level: Install a separate data model acceleration search head built entirely for the purpose of running the data model accelerations. It does nothing else, as in a large deployment, accelerating all the data models will keep it quite busy. Now, this means this search head will need plenty of memory and plenty of CPU cores to ensure the acceleration search jobs run smoothly, quickly, and do not queue up waiting for CPU resources or, worse yet, get skipped altogether. The data models for the entire deployment are managed on this job server. They are all accelerated by this instance and every other search head and search head cluster has a datamodel.conf where all data model stanzas have an “acceleration.source_guid” setting pointing to this data model job search head.

This gives you two big advantages. First, all the other search heads and clusters are freed up to use the accelerated data models without having to expend the resources to maintain them. It separates the maintenance of the data model accelerations from the use of them. Even in an environment where only one search head or search head cluster is utilizing these accelerated data models, this advantage alone can be significant.

So often in busy Enterprise Security implementations, you can encounter significant skipped search ratios because regularly run correlation searches collide with regularly run acceleration jobs and there just aren’t enough host resources to go around. By offloading the acceleration jobs to a separate search head, this risk of data model content loss because of skipped accelerations or missed notable events because of skipped correlation searches is greatly diminished.

Second, since only one instance creates all the data models, there is only one copy of the summary data on the indexers, not multiple duplicate copies for various search heads, saving potentially gigabytes of disk space. And, since the accelerations are only run once on those indexers, indexer resources are freed up to handle more search load.

In the world of medium and large distributed Splunk deployments, Splunk instances get specialized—indexers do indexing, search heads do searching. We also often have specialized instances for the Monitoring Console, the Cluster Manager, the Search Head Cluster Deployer, and complex modular inputs like DBConnect, Splunk Connect for Syslog, and the AWS add-ons. The introduction of Splunk Cloud has brought us the “Inputs Data Manager,” or IDM, instance for these modular inputs. I offer to you that we should add another instance type to this repertoire—the DMA instance to handle all the data model accelerations. No decently-sized Splunk deployment should be without one.

Want to learn more about data model accelerations? Contact us today!

Using Child Playbooks in Splunk Phantom

      By: Joe Wohar  |  Senior Splunk Consultant


Splunk Phantom is an amazing SOAR platform that can really help your SOC automate your incident response processes. It allows you to build playbooks, which are Python scripts under the covers, that will act on security events that have been ingested into the platform. If you have a well-defined process for handling your security events/incidents, you can build a Splunk Phantom playbook to run through that entire process, therefore saving your security analysts time and allow them to work on more serious incidents.

A common occurrence with Splunk Phantom users is that they create a playbook that they want to use in conjunction with other playbooks. For example, a security analyst created three playbooks: a phishing playbook, an unauthorized admin access playbook, and a retrieve user information playbook. In both a phishing event and an unauthorized admin access event, they’d like to retrieve user information. Therefore, the analyst decides to have each of those playbooks call the “retrieve user information playbook” as a child playbook. However, when calling another playbook as a child playbook, there a few gotchas that you need to consider.

Calling Playbooks Synchronously vs. Asynchronously

When adding a playbook block to a playbook, there are only two parameters: Playbook and Synchronous. The Playbook parameter is simple: choose the playbook you’d like to run as a child playbook. The Synchronous option allows you to choose whether or not you’d like to run the child playbook synchronously.

A screenshot from Splunk Phantom showing options for retrieving user information from a playbook.

Choosing “OFF” will cause the child playbook to run asynchronously. This means that the child playbook is called to run on the event the parent playbook is running against, and then the parent playbook continues down the path. If you’ve called the child playbook at the end of the parent playbook, then the parent playbook will finish running and the child playbook will continue running separately.

Choosing “ON” means that the parent playbook will call the child playbook and wait for it to finish running before moving on to the next block. So when a child playbook is called, you have two playbooks running at the same time on the event. This means that every synchronous child playbook is a performance hit to your Splunk Phantom instance. It is best to avoid running child playbooks synchronously unless absolutely necessary due to the performance impact.

Since there are cases where you might need the child playbook to be synchronous, there are a few tips to avoid causing too much of a performance impact.

  1. Keep your child playbooks short and simple. You want your child playbook to finish running quickly so that the parent playbook can resume.
  2. Avoid adding prompts into child playbooks. Prompts wait for a user to take an action. If you put a prompt into a child playbook, the parent playbook has to wait for the child playbook to finish running and the child playbook has to wait for user input.
  3. Avoid using “no op” action blocks from the Phantom app. The “no op” action causes the playbook to wait for a specified number of seconds before moving on to the next block in the path. The “no op” block causes the child playbook to take longer to run, which you usually want to avoid, but there are instances where you may need to run a “no op” action in a child playbook (covered later).
  4. When using multiple synchronous child playbooks, run them in series, not parallel. Running synchronous child playbooks in series ensures that at any given time during the parent playbook’s run, only two playbooks are running at the same time: the parent playbook and one child playbook.

Sending Data Between Parent and Child Playbooks in Splunk Phantom

When calling a child playbook, the only thing that is carried over to the child playbook is the event id number. None of the block outputs from the parent playbook are carried into the child playbook. This creates the problem of how to get data from a parent playbook into a child playbook. There are two main ways of doing this: add the data to a custom list or add data to an artifact in the container. The first option, adding data to a custom list, is a very inconvenient option due to how difficult it is to get data out of a custom list. Also, custom lists are really designed to be a list for checking values against, not storing data to be pulled later.

Adding data to an artifact in the container can be done in two different ways: update an artifact or create a new artifact. Adding data to an artifact is also much easier than adding and updating data in a custom list because there are already actions created to do both tasks in the Phantom app for Phantom: update artifact and add artifact. “Update artifact” will require you to have an artifact id as a reference so it knows which artifact in the container to update. Adding an artifact is simpler because you can always add an artifact, but you can only update an artifact if one exists.

When adding an artifact, there can be a slight delay between the time the action runs and when the artifact is actually added to the container. My advice here is when you add an artifact to a container that you want to pull data from in the child or parent playbook, add a short wait action (you only need it to wait 2 seconds) immediately after the “add artifact” action. You can have the playbook wait by adding a “no op” action block from the Phantom app for Phantom (which you should already have installed if you’re using the add artifact and update artifact actions).

Documentation Tips for Parent and Child Playbooks in Splunk Phantom

When creating a child playbook that you plan to use in multiple parent playbooks, documentation will really help you manage your playbooks in the long run. Here are a couple of quick tips for making your life easier.

  1. Use a naming convention for the child playbooks at least. I’d definitely recommend using a naming convention for all of your playbooks, but if you don’t want to use a naming convention for parent playbooks, at the very least use one for the child playbooks. Adding something like “ – [Child]” will really make it easier to find child playbooks and manage them.
  2. Put the required fields for the child playbook into the playbook’s description. Calling a child playbook is very easy, but if your parent playbook isn’t using the same CEF fields as the child playbook, you’re going to have a problem. Adding this list to the description will help let you know if you need to update your container artifact to add those needed fields or not.

Follow these tips and tricks and you’ll be setting yourself up for a performant and easy-to-manage Splunk Phantom instance for the long term.

Want to learn more about using playbooks in Splunk Phantom? Contact us today!

Deep Freeze Your Splunk Data in AWS, Part 2

      By: Zubair Rauf  |  Senior Splunk Consultant, Team Lead

In Part 1 of this blog post, we touched on the need to freeze Splunk data in AWS S3. In that post, we described how to do this using a script to move the Splunk bucket into S3. In this post, we will describe how to accomplish the same result by mounting the S3 bucket on every indexer using S3FS-Fuse, then telling Splunk to just move the bucket to that mountpoint directly. S3FS is a package available in the EPEL Repository. EPEL (Extra Packages for Enterprise Linux) is a repository that provides additional packages for Linux from the Fedora sources.

High Level Process

  1. Install S3FS-Fuse
  2. Mount S3 Bucket using S3FS to a chosen mountpoint
  3. Make the mountpoint persistent by updating the rc.local script
  4. Update the index to use ColdToFrozenDir when freezing data
  5. Verify frozen data exists in S3 bucket

Dependencies and Installation

The following packages are required to make this work:

Package Repository
S3FS-Fuse epel
dependency: fuse amzn2-core
dependency: fuse-lib amzn2-core

Note: This test was done on instances running Centos-7 and did not have the EPEL repo added for yum. Therefore, we had to install that as well before proceeding with the S3FS-Fuse installation.

Install S3FS-Fuse

The following commands (can also be scripted once they are tested to work in your environment) were used to install EPEL and S3FS-fuse on test indexers. You have to run these as root on the indexer hosts.

cd /tmp
wget https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
yum -y install ./epel-release-latest-7.noarch.rpm
rpm --import http://download.fedoraproject.org/pub/epel/RPM-GPG-KEY-EPEL-6
yum -y install s3fs-fuse

Mounting the S3 Bucket

Use the following commands to mount the S3 Bucket to a new folder called frozen-s3 in /opt/splunk/data.

Note: This method uses the passwd-s3fs file to access the S3 bucket. Please ensure that the AWS credentials you use have access to the S3 bucket. The AWS credentials need to be a user role with an Access Key and Secret Key generated. I created a user, ‘splunk_access,’ which has a role, ‘splunk-s3-archival,’ attached with it. This role has explicit permissions to access my test S3 bucket.

The S3 bucket has the following JSON policy attached with it which gives the ‘splunk-s3-archival’ role full access to the bucket. The account_id in the policy is your 12-digit account number.


"Version": "2012-10-17",

"Id": "Policy1607555060391",

"Statement": [


"Sid": "Stmt1607555054806",

"Effect": "Allow",

"Principal": {

"AWS": "arn:aws:iam:::role/splunk-s3-archival"

"Action": "s3:*",

"Resource": "arn:aws:s3:::splunk-to-s3-frozen-demo"



The following commands should be run as root on the server. Please make sure to update the following variables listed inside < > in the commands with your respective values:

  • splunk_user → The user that Splunk will run as
  • aws_region → The AWS region your S3 bucket was created in
  • bucket_name → S3 bucket name
  • mount_point → Path to the directory where the S3 bucket will be mounted

These commands can be run on one indexer manually to test in your environment and scripted for the remaining indexers.

cd /opt/splunk/data

mkdir frozen-s3
cd /opt/splunk/data/frozen-s3
sudo vi /home//.s3fs/passwd-s3fs ## Add AWS Access_key:Secret_key in this file
sudo chmod 600 /home//.s3fs/passwd-s3fs
su -c 's3fs -d -o passwd_file=/home//.s3fs/passwd-s3fs,allow_other,endpoint= '
echo su -c 's3fs -d -o passwd_file=/home//.s3fs/passwd-s3fs,allow_other,endpoint=us-east-2 ' >> /etc/rc.d/rc.local
chmod +x /etc/rc.d/rc.local

Adding the mount command to the rc.local script ensures that the rc.local script mounts the S3 bucket on boot.
Once you have manually mounted the S3 bucket, you can use the following command to verify the bucket has mounted successfully.

df -h
## I get the following output from df -h

[centos@s3test ~]$ df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 485M 0 485M 0% /dev
tmpfs 495M 0 495M 0% /dev/shm
tmpfs 495M 6.8M 488M 2% /run
tmpfs 495M 0 495M 0% /sys/fs/cgroup
/dev/xvda2 10G 3.3G 6.8G 33% /
tmpfs 99M 0 99M 0% /run/user/1001
s3fs 256T 0 256T 0% /opt/splunk/data/frozen-s3
tmpfs 99M 0 99M 0% /run/user/1000
tmpfs 99M 0 99M 0% /run/user/0

The bolded mount is the s3 bucket mounted using S3FS.

Setting Up Test Index

Create a test index with the following settings and push it out to all indexers through the Cluster Master:

homePath = $SPLUNK_DB/s3-test/db
coldPath = $SPLUNK_DB/s3-test/colddb
thawedPath = $SPLUNK_DB/s3-test/thaweddb
frozenTimePeriodInSecs = 600
maxDataSize = 10
maxHotBuckets = 1
maxWarmDBCount = 1
coldToFrozenDir = $SPLUNK_HOME/data/frozen-s3/_index_name

The coldToFrozenDir parameter in the above stanza defined where Splunk should freeze the data for this index. This needs to be set for every index you wish to freeze. Splunk will automatically replace the _index_name variable from the coldToFrozenDir parameter to the index name (S3-test) in this case. This makes it easier to copy the parameter to multiple individual index stanzas.

For testing purposes, the frozenTimePeriodInSecs, maxDataSize, maxHotBuckets, and maxWarmDBCount have been set very low. This is to ensure that the test index rolls data fast. In production, these values should either be left as the default or changed in consultation with a Splunk Architect.

The index needs to be setup/updated on the Cluster Master and configurations pushed out to all indexers.

Adding and Freezing Data

Once the indexers are restarted with the correct settings for index, upload sample data from the UI. Keep adding sample data files to the index till the index starts to roll over hot and warm buckets to cold and eventually frozen. Eventually, you will start to see your frozen data present in your S3 bucket.

Want to learn more about Freezing Data to S3 with Splunk? Contact us today!

Driving Growth by Leveraging AWS and Document Understanding

Your company is sitting on a potential gold mine of stored data. Tucked away on servers and cloud-based drives are the answers and insights you need to take your business to that next level of growth. Advancements in machine learning and artificial intelligence have made it easier (and less expensive) to analyze this data through Document Understanding. Companies that leverage the Amazon Web Services (AWS) platform to support their needs, tying in a Document Understanding initiative can have a fundamental impact on driving growth and securing a more profitable bottom-line.

What is Document Understanding?

Historically, the chief hurdle to analyzing this data is that much of this data is unstructured – composed of text-based files, reports, survey results, social media posts, notes, and random PDFs. Sifting through this quagmire was expensive and inefficient as it had to be done by hand.

That was the old way.

Fueled by natural language processing (NLP) and machine learning (ML), these systems analyze text-based documentation (PDFs, notes, reports) to uncover insights. The machine-learning capabilities allow you to “teach” the AI how to read your specific documentation and guide its insight discovery.

How Document Understanding Can Benefit Your Enterprise Corporation

Enterprise companies are already tapping the power of AWS’s Document Understanding solution to garner essential insights into critical business functions.  Regardless of industry vertical, businesses are using Document Understanding to:

  • – Instantly search for information across multiple scanned documents, PDFs, images, reports, and stored text files.
  • – Redact critical information from documents and identify compliance threats in real-time.
  • – Digitize, store, and analyze customer feedback and request forms.
  • – Identify overarching communication trends and isolate specific messaging that can be used to improve the customer experience or marketing campaign.

And this is just the proverbial tip of the benefits iceberg. Through the machine learning aspect of Document Understanding, you can tailor your use of this technology to identify and analyze the data sets that have the most impact on your business and bottom line.

Driving Document Understanding through Intelligence with AWS Content Process Automation

As a certified AWS Advanced Consulting Partner, we are excited to announce the launch of our new AWS Content Process Automation (CPA) offering. Our new CPA tool integrates with the AWS platform to provide a structured process and streamlined toolset for implementing and managing an ongoing Document Understanding initiative.

Through our new AWS CPA offering, brands can:

  • – Make previously inaccessible data actionable at scale.
  • – Automate tendencies but necessary business processes.
  • – Improve compliance and risk management.
  • – Identify opportunities to increase operational efficiency and reduced costs.

How TekStream CPA Works

Historically, analyzing sizeable unstructured data sets for actionable information has been a time-consuming and costly initiative. Most of the work had to be done manually – which can be both costly and inefficient.

Our new CPA offering leverages artificial intelligence and machine learning, along with defined scope and direction, to increase the speed and accuracy for data discovery while eliminating much of the manual aspect of data mining.

Using machine-learning services like Amazon Textract and Amazon Rekognition, TekStream CPA inspects documents, images, and video (collectively called “files”), gathers key information and insights, and automatically stores these files logically to ensure easier access to critical information. Amazon Augmented AI (Amazon A2I) routes files requiring further review to content specialists and information managers to edit associated information, take corrective actions, and approve files for storage.

TekStream CPA relentlessly and automatically investigates content to find key insights and associations that might not be easily discovered by the naked eye. Users and administrators establish business rules defining what information is important, how it will be managed, and the storage rules for documents, images, forms, video files, and unstructured data. This ensures critical business facts and figures are available for business operations.

Built for Growth

Use these systems to gain a deeper understanding of internal and consumer audience sentiment around your brand or a specific product.

Analyzing your unstructured data sets is only part of the business growth equation. To achieve a true return on your investment and drive a noticeable impact on your bottom-line, you also need to transform your insights into actions. By leveraging serverless technologies like Amazon Lambda through our Content Process Automation tool, administrators can create functions to call their own services for file conversions, reformatting, and many more to meet specific business criteria.

Start driving business growth today. TekStream has deep experience helping clients across multiple industries accelerate their digital transformation and begin leveraging the power of Document Understanding to push their business forward. Reach out to us today to learn more about what CPA and Document Understanding can do for your business.

Want to learn more about unlocking value from your unstructured data? Download our latest eBook, “9 Steps to Unlocking Value from Your Unstructured Data and Content.”

8 Benefits to Using Document Understanding to Mine Unstructured Data

What if we told you that your business was sitting on a mountain of untapped business intelligence or that hidden away in archived emails, documents, and customer survey results are the very insights you need to drive growth and improve your bottom line? These types of text-based documents are a form of “unstructured data” and (alongside image libraries, data streams, and similar data deposits) account for nearly 80% of all the data that an enterprise company generates and stores.

How do you analyze all of this data to identify the specific insights that can drive change and improve performance in your organization? Through Document Understanding.

Understanding Document Understanding

Document Understanding is one of the three core AI capabilities fueling the unstructured data analysis industry (the other two being Computer Vision and IoT analysis). This system leverages the power of natural language processing and machine learning to analyze text-based documents (PDFS, notes, reports) to uncover actionable business insights.

The machine-learning capabilities of these systems allow your organization to “teach” the AI to read your specific documentation and discover insights that specific to your brand and audience.

8 Benefits of Analyzing Your Company’s Unstructured Data

The fact that the market size for natural language processing is estimated to reach over $16B by 2021 proves that organizations large and small are investing in tools and systems that analyze their unstructured data. This means that these companies are confident that the benefit of this work will outweigh the costs of these new systems.

While these benefits differ between industries, some of the key benefits to mining unstructured data includes:

1. Finding Opportunities to Improve Your Customer Experience

Retain more customers (and win over new fans) by using Document Understanding to analyze customer surveys and reviews to identify where your company can provide better customer service.

2. Discover New Opportunities in The Market

What is the “next big thing” in your industry? How will you ensure your company will stay relevant to consumers over the next 20 years? Turn your data lake into a blue ocean by mining your unstructured data for relevant insights and consumer trends.

3.  Know Your Audience Better With Sentiment Analysis

Use these systems to gain a deeper understanding of internal and consumer audience sentiment around your brand or a specific product.

4. Make Key Decisions Faster and More Accurately

Quit getting bogged down with analysis paralysis. Get the data you need to identify and take action on the “right” decision when it counts most.

5. Improve Team Productivity and Reduce/Remove Outdated Data Processing Techniques

Through automation, you can eliminate data processing bottlenecks and instead focus your employees on more high-value tasks.

6. Identify and Eliminate Unnecessary Cost Centers

Get a handle on your waste by understanding what areas of your business are costing you money (without providing a correlating ROI).

7.  Gain a Better Understanding of Your Customer Behavior and Buying Triggers

Improve the performance of your marketing campaigns and customer retention efforts by gaining more in-depth insight into what makes your customers your customers in the first place.

8.  Avoid Costly Regulatory or Compliance Issues

Uncover regulatory or compliance issues before they negatively impact your company.

Start with The End in Mind

Ready to get started analyzing your unstructured data, but not sure where to begin? We recommend starting with the end goal in mind. What is your highest unstructured data analysis priority? Are you sitting on a mountain of customer surveys? Are you curious about where your hidden costs centers are?

Understand which aspects of your unstructured data analysis will have an immediate impact on your business’s bottom line. Then work backward to develop the tools and systems you need to discover this intelligence.

If you are not sure where to begin, we can help. We’ve helped companies across a myriad of industries turn their unstructured data into business growth rocket fuel. Contact us today to learn how we can do the same for you.

If you’d like to learn more about how to unlock value from your unstructured data? Download our free eBook, “9 Steps to Unlocking Value from Your Unstructured Data and Content.”