By: Forrest Lybarger | Splunk Consultant
The bin command is a relatively uncommon, but incredibly useful tool in Splunk. How it works is a user gives it a field (the field must be numeric) then Splunk groups the events by the specified field. The next important thing to know about bin is that the time chart command calls the bin command behind the scenes, so only use the bin command if the time chart command can’t perform the task. Below I will go through the options for the bin command and examples of how to use it.
At its most basic level, the bin command will group events in groups based on intervals. For example: “…| bin GB as bin_size” will create the bin_size field and assign a range to the event such as 0-10 or 10-20. The result looks like the screenshot below.
Without the “as bin_size” part of the command the range value will be assigned to the GB field instead. So long as the field is numeric, the bin command can group events along ranges, but utilizing the command’s options will give users more control over the results.
The bins option is very simple in application. It limits the number of buckets the command can create by establishing a maximum. For example: “…| bin bins=20 linecount as bin_size” will limit bin_size to only having 20 different values, but it might not reach 20 values. The results could look something like the screenshot below.
Splunk determines the size of the buckets on its own here, but there is a way to control the bucket size with other options.
The minspan option lets a user set the minimum size of buckets. This means that you can prevent a bucket from being too granular for your use case. For example: “…| bin minspan=100 linecount as bin_size” prevents the results from grouping into anything smaller than a 0-100 bucket.
The span option is by far the most useful option in the bunch. It allows you to control the size of the buckets, which, when combined with the other options, gives users much more control over the bin command’s results. For example: “…| bin span=5 linecount as bin_size” creates buckets with a size of 5. The span value can be numeric, time-based, or logarithmic.
The end option also controls the size of buckets, but in an indirect way. When given a value, the end option causes the bin command to change the way it automatically calculates bucket sizes by having it use the end value as the highest value. For example: “…| bin end=1000 linecount as bin_size” causes the bin command to make one large bucket of size 0-100 because the bin command thinks results range 0-1000 and then it wants to break the results into buckets of 100.
The span option overrides end.
The start option does a similar operation, but on the beginning value and is also overridden by the span option.
The aligntime option is last and is only valid when dividing events by _time. This option can offset the bucket partitioning and is ignored if span is in days, months, or years. Aligntime is almost always used in conjunction with span in order to set the bucket size. For example: “…| bin span=2h aligntime=@d+1h _time as bucket” will build 2h buckets with a 1hr offset, meaning the buckets cover times between odd hours.
While the bin command isn’t the most common search command in SPL, it is very powerful in specific circumstances. If you ever encounter data that you want to group but might not want to use a transforming command or many other niche cases, you can use the bin command to group events. Since it is also a very underutilized command, you can possibly save someone else a lot of time with this added knowledge.
Want to learn more about the bin command? Contact us today!