The Bin Command

By: Forrest Lybarger | Splunk Consultant

 

The bin command is a relatively uncommon, but incredibly useful tool in Splunk. How it works is a user gives it a field (the field must be numeric) then Splunk groups the events by the specified field. The next important thing to know about bin is that the time chart command calls the bin command behind the scenes, so only use the bin command if the time chart command can’t perform the task. Below I will go through the options for the bin command and examples of how to use it.

At its most basic level, the bin command will group events in groups based on intervals. For example: “…| bin GB as bin_size” will create the bin_size field and assign a range to the event such as 0-10 or 10-20. The result looks like the screenshot below.

Without the “as bin_size” part of the command the range value will be assigned to the GB field instead. So long as the field is numeric, the bin command can group events along ranges, but utilizing the command’s options will give users more control over the results.

 

Bins

The bins option is very simple in application. It limits the number of buckets the command can create by establishing a maximum. For example: “…| bin bins=20 linecount as bin_size” will limit bin_size to only having 20 different values, but it might not reach 20 values. The results could look something like the screenshot below.

Splunk determines the size of the buckets on its own here, but there is a way to control the bucket size with other options.

 

Minspan

The minspan option lets a user set the minimum size of buckets. This means that you can prevent a bucket from being too granular for your use case. For example: “…| bin minspan=100 linecount as bin_size” prevents the results from grouping into anything smaller than a 0-100 bucket.

 

 

Span

The span option is by far the most useful option in the bunch. It allows you to control the size of the buckets, which, when combined with the other options, gives users much more control over the bin command’s results. For example: “…| bin span=5 linecount as bin_size” creates buckets with a size of 5. The span value can be numeric, time-based, or logarithmic.

 

 

Start/End

The end option also controls the size of buckets, but in an indirect way. When given a value, the end option causes the bin command to change the way it automatically calculates bucket sizes by having it use the end value as the highest value. For example: “…| bin end=1000 linecount as bin_size” causes the bin command to make one large bucket of size 0-100 because the bin command thinks results range 0-1000 and then it wants to break the results into buckets of 100.

The span option overrides end.

The start option does a similar operation, but on the beginning value and is also overridden by the span option.

 

Aligntime

The aligntime option is last and is only valid when dividing events by _time. This option can offset the bucket partitioning and is ignored if span is in days, months, or years. Aligntime is almost always used in conjunction with span in order to set the bucket size. For example: “…| bin span=2h aligntime=@d+1h _time as bucket” will build 2h buckets with a 1hr offset, meaning the buckets cover times between odd hours.

 

Conclusion

While the bin command isn’t the most common search command in SPL, it is very powerful in specific circumstances. If you ever encounter data that you want to group but might not want to use a transforming command or many other niche cases, you can use the bin command to group events. Since it is also a very underutilized command, you can possibly save someone else a lot of time with this added knowledge.

Want to learn more about the bin command? Contact us today!

Masking Important Data in Your Splunk Environment

By: Aaron Dobrzeniecki | Splunk Consultant

 

If you have problems or questions regarding masking important data when it gets ingested into Splunk, this is the blog for you. Common use cases include masking credit card numbers, SSN, passwords, account IDs, or anything that should not be visible to the public. When masking data before it gets indexed into Splunk, you want to make sure you (if applicable) test it in a dev environment. A great website to use is www.regex101.com.

The overall methodology of how the two approaches work specifically relies on the correctness of your regular expression. Splunk will look for strings that match the defined regex pattern. You can then tell Splunk to strip out, replace the matching string, or replace part of the string. Both of the methods below do the same exact thing – match a regex and replace the values – but both methods do it in a slightly different manner.

In the example data below, I will be masking the account IDs to only show the last four digits of the account ID. There are two ways you can mask data before it gets ingested into Splunk.

Method 1:

Using props.conf and transforms.conf to modify the data so that the first 12 characters of the account ID turn into “x”‘s.

One sample event:

[02/Nov/2019:16:05:20] VendorID=9999 Code=D AcctID=9999999999999999

When ingested into Splunk using the below props.conf and transforms.conf the event will be indexed as so:

[02/Nov/2019:16:05:20] VendorID=9999 Code=D AcctID=xxxxxxxxxxxx9999

props.conf

[mysourcetype]

TRANSFORMS-data_mask=data_masking

 

transforms.conf

[data_masking]

SOURCE_KEY=_raw

REGEX=(^.*)(\sAcctID=)\d{12}(\d*)

FORMAT=$1$2xxxxxxxxxxxx$3

DEST_KEY=_raw

Specify the field you want Splunk to search for the matching data in using the SOURCE_KEY parameter. Splunk will attempt to match the regex specified in the REGEX setting. If it matches, Splunk will replace the matching portion with the value from FORMAT and then write the transformed value to the field specified in DEST_KEY (which is the same in this example). The values for FORMAT are as followed. The dollar sign digit relates to the capture groups. In the example above you can see that there are 3 total capture groups: (^.*) is the first capture group; (\sAcctID=) is the second capture group; and finally (\d*) is the third capture group (I included a third capture group to specify extra digits, if they exist in the event or not). See how we did not include the \d{12}? This is because THAT regex string is what we want to mask.

The basis behind masking your important data is to make sure that you have created the correct regex. In the example above I created the entire regex string that encompasses an entire event. In doing so, we are able to bring back the entire event using the capture groups and ridding the event of the data to be masked.

Another way to mask important data from being ingested into Splunk is to use the SEDCMD to replace the desired texts with X’s or whatever you want to show that the data has been masked. Using the same sample event above we will get the same results as above, but using a different method.

Method 2:

props.conf

[mysourcetype]

SEDCMD-replace=s/AcctID\=\d{12}/AcctID=xxxxxxxxxxxx/g

The above props.conf will mask the data as desired. The key here is to make sure that your regex string (the one that is replacing the original regex string) includes the part that you want to keep and does not include the string that you want to get rid of. With SEDCMD, Splunk replaces the current regex with the regex you specify in the third segment of the SEDCMD.

In conclusion, there are two ways to anonymize data with Splunk Enterprise:

Use the SEDCMD like a sed script to do replacements and substitutions. The sed script method is easier to do, takes less time to configure, and is slightly faster than a transform. But there are limits to how many times you can invoke SEDCMD and what it can do.

Use a regular expression transform (method 1). This method takes longer to configure, but is easier to modify after the initial configuration and can be assigned to multiple data inputs more easily.

Want to learn more about masking important data in your Splunk environment? Contact us today!

Create Splunk Indexes and HEC Inputs with Ansible

By: Brandon Mesa | Splunk Consultant

Managing Splunk .conf files is a day to day routine for most, if not all, Splunk admins. As your Splunk environment matures, you’ll find yourself making constant .conf changes to improve operational efficiency. For example, as new data sources are onboarded, new indexes and parsing settings are implemented to maintain efficiency and the appropriate data segregation controls in place. To access this new data or index you might also have to create a new role or manage an existing one in order to set the appropriate data permissions to a specific set of users. You may also explore alternate data inputs such as making use of the HTTP Event Collector.

Manually completing these tasks can become time-consuming and error-prone. While you can’t automate every change on the back end, you may be able to standardize some of the common configuration changes. For example, common tasks include creating a new index, role, HEC token, and many more. You can use a variety of automation tools to manage your .conf files and reduce time spent making manual .conf changes. This blog will show you how to use Ansible playbooks to automate common Splunk tasks including index and HEC input creation.

To keep this blog simple, examples will be applied to a local standalone instance in the $SPLUNK_HOME/etc/system/local path. The location of .conf changes will vary depending on your specific environment.

The following Ansible playbooks are used in this blog:

create_index.yaml

 

create_hec_token.yaml

 

Create an Index

To create a new index with Ansible playbooks, run the following command:

% ansible-playbook create_index.yaml -e ‘{“index_name”:”ansible_index”}’

Shown below, you can see the new index “ansible_index” has now been created on the indexes.conf.

 

If you run the playbook again to create a new index with an existing index name, an error will be returned and escape the playbook execution. For example, if we try to create the “ansible_index” index a second time, the playbook escapes execution and returns the following message:

“ansible_index – Index string already found in indexes.conf”

 

Take a look at the returned message for the “Confirm if index already exists” task. The playbook reads the indexes.conf file and looks for the index_name variable passed at the time the CLI command is run. If the string is found in the file, the playbook skips the stanza creation.

 

Create a HEC Token

We’ve created a new index for all the Ansible related data. Now let’s create a new HEC input that will constraint incoming data to the new index. To create a new HEC token, run the following Ansible playbook:

% ansible-playbook create_hec_token.yaml -e ‘{“username”:”admin”,”password”:”Pa$$w0rd”,”token_name”:”ansible_token”,”index”:”ansible_index”,”indexes”:”ansible_index”}’

Playbook execution will look something like this:

 

Now let’s validate our token has been created:

 

Automation tools can facilitate day-to-day operations related to your Splunk infrastructure. It’s not likely that all .conf changes will be automated in your environment as you’ll come across unique use cases that will require specific configurations. However, you can automate some of the common manual tasks, like the ones shown above, to reduce time spent and avoid any silly mistakes.

Want to learn more about creating Splunk indexes and HEC inputs with Ansible? Contact us today!

 

 

Press Release: TekStream Makes INC. 5000 List for Sixth Consecutive Year

For the 6th Time, Atlanta-based Technology Company Named One of the Fastest-growing Private Companies in America with Three-Year Sales Growth of 131%

Atlanta-based technology company, TekStream Solutions, is excited to announce that for the sixth time in a row, it has made the Inc. 5000 list of the fastest-growing private companies in America. Only 2.15% of companies have made the list six times. This prestigious recognition comes again just nine years after Rob Jansen, Judd Robins, and Mark Gannon left major firms and pursued a dream of creating a strategic offering to provide enterprise technology software, services, solutions, and sourcing. Now, they’re a part of an elite group that, over the years, has included companies such as Microsoft, Timberland, Vizio, Intuit, Chobani, Oracle, and Zappos.com.

“Being included in the Inc. 5000 for the sixth straight year is something we are truly proud of as very few organizations in the history of the Inc. 5000 list since 2007 can sustain the consistent and profitable growth year over year needed to be included in this prestigious group of companies,” said Chief Executive Officer, Rob Jansen. “Continued adoption by our clients for cloud-based technologies, Security, and Big Data solutions to solve complex business problems has been truly exciting. We are helping our clients take advantage of today’s most advanced recruiting and technology solutions to digitally transform their businesses and address the ever-changing market.”

This year’s Inc. 5000 nomination comes after TekStream has seen a three-year growth of over 131%, and 2020 is already on pace to continue this exceptional growth rate even amidst the impact of COVID-19 and the global pandemic.

“The pandemic has moved the demand for digital transformation to cloud technologies from high priority to absolutely critical. Overnight customers have been forced to establish new channels of remote collaboration just to maintain normal business functions. Preserving revenue streams and managing high operational costs is more important than ever” said Judd Robins, Executive Vice President. “The economic pause has created a window for companies to take another look at legacy technology debt, evaluate more cost-effective cloud options, and retool their platforms ahead of the rebound. TekStream’s continued growth is being fueled by those efforts and we’re happy to take the lead position with our customers.”

To qualify for the award, companies had to be privately owned, established in the first quarter of 2015 or earlier, experienced a two-year growth in sales of more than 50 percent, and garnered revenue between $2 million and $300 million in 2019.

“The prestigious recognition in trying times speaks to our team’s commitment to adapt Recruiting, RPO and Technology solutions to our client needs and the many relationships we service on both the candidate and client-side of our business. As economic and conditions change, we look forward to the challenge of raising our level of service to meet expectations of our internal and consulting staff, as well as positively impacting client and candidate hiring experiences” said TekStream Executive Vice President of Talent Management and Recruiting Services, Mark Gannon.

TekStream accelerates clients’ digital transformation by navigating complex technology environments with a combination of technical expertise and staffing solutions. We guide clients’ decisions, quickly implement the right technologies with the right people, and keep them running for sustainable growth. Our battle-tested processes and methodology help companies with legacy systems get to the cloud faster, so they can be agile, reduce costs, and improve operational efficiencies. And with 100s of deployments under our belt, we can guarantee on-time and on-budget project delivery. That’s why 97% of clients are repeat customers. For more information visit https://www.tekstream.com/