By: Yetunde Awojoodu | Splunk Consultant
Have you ever come across fields with multiple values in your event data in Splunk and wondered how to modify them to get the results you need? Each field in an event typically has a single value, but for events such as email logs you can often find multiple values in the “To” and “Cc” fields. Multivalue fields can also result from data augmentation using lookups. To properly evaluate and modify multivalue fields, Splunk has some multivalue search commands and functions. If you ignore multivalue fields in your data, you may end up with missing and inaccurate data, sometimes reporting only the first value of the multivalue field(s) in your results.
In this article, I have applied a simple scenario to illustrate how different multivalue commands and functions can be used individually or combined to meet different use cases. I will cover some common search commands and functions that work with multivalue fields. Note that multivalue functions can be used with eval, where or fieldformat search commands. In my illustrations, I employed the “makeresults” command to generate hypothetical data for my searches so that anyone can recreate them without the need to onboard data. Read more on the makeresults command.
Within one purchase transaction, Mary bought eggs, milk and bread. She paid for the eggs with cash and covered the remaining items using her credit card. We can assume that this purchase transaction is equivalent to a log event. The values for each multivalue field are separated by the comma delimiter.
Please note that in all the results, I have deliberately excluded the default field, “_time” which is a default field generated when the makeresults command is used.
This command is used to split the values of a field that appear like a single value into multiple values within an event based on the delimiter. A delimiter specifies the boundary between characters.
The values in the “groceries” field have been split within the same event based on the comma delimiter. The values in the “payment” field remain the same. The report shows the method of payment for all three grocery items but it does not specify the actual payment method used for each item. To expand the event into three separate events, one for each item and show the exact payment for each grocery item, we will need a combination of commands and functions.
The mvzip function is used to tie corresponding values in the different fields of an event together. This helps to keep the association among the field values. This function takes two multivalue fields, X and Y, and combines them by stitching together the first value of X with the first value of field Y, then the second with the second, and so on.
The new field, “zipped” is the result of the mvzip function. The values of the groceries and payment fields are properly zipped together before expanding into separate events. Note that at this point, the results are still within one event.
This command expands the values of a multivalue field into separate events, one event for each value in the multivalue field. All other single field values and unexpanded multivalue field values will remain the same in each new event.
Mvexpand works great at splitting the values of a multivalue field into multiple events while keeping other field values in the event as is but it only works on one multivalue field at a time. For instance, in the above example, mvexpand cannot be used to split both “zipped” and “payment” fields at the same time. The next function will come in handy to accomplish this.
Having zipped the values and created one field, “zipped”, you can now expand the “zipped” field into multiple events. The mvindex function is a little more intricate. To further tie field values together so that accurate associations are made in the process of expanding the values into separate events, mvindex will separate the existing multivalued field into two chosen fields using index values. Indexes can start at zero if labeling from the first value. For example, if values= a,e,i,o,u; a=0 e=1 i=2 o=3 u=4. You could also label from the last character with -1; a=-5 e=-4 i=-3 o=-2 u=-1 or you could choose to have a combination of both index patterns; a=0 e=1 i=2 o=-2 u=-1.
Mvindex was used to assign index 0 to the first value in the group which represents groceries and index 1 to the second value representing payment method so that when the fields are split, the values will not get mixed up. The “split” command was used to separate the values on the comma delimiter. Using mvindex and split functions, the values are now separated into one value per event and the values correspond correctly.
Tip – The stats command can also be used in place of mvexpand to split the fields into separate events as shown below:
This function can be used to quickly determine the number of values in a multivalue field using the delimiter. If the field contains a single value, the function returns 1 and if the field has no values, the function returns NULL.
As with single value fields, keep in mind that you may need a combination of multivalue commands/functions to get your report in the required format that will meet your specific use case.
Note: If there are situations in your data where a field is sometimes multivalue and other times null, refer here
Want to learn more about working with multivalue fields in Splunk? Contact us today!