A Use Case for Ingest Time Eval
By: Zubair Rauf | Senior Splunk Consultant
A few days ago, I came across an interesting challenge that a customer put in front of me. They had been facing this for some time now. The customer works with an app that logs all of its events 7 hours ahead of Eastern time, irrespective of daylight savings time. The server clock reset to midnight when Eastern time was 5:00 PM all year round. To work around this problem and make sure the events were always synced with the correct time zone, they adjusted the sourcetype for those logs every time daylight savings time started or ended.
When presented with this problem, I spent a good amount of time to find a time zone that would change with eastern time when daylight savings time changed and have the same time offset as those logs. Not having any success on that front, I started looking at alternatives to help my customer overcome their issue and I came across this powerful way to solve the problem with a one-time fix with the sourcetype.
Splunk introduced Ingest time evals with Splunk Enterprise 7.2. Ingest time evals are similar to search time evals that have helped Splunk be the powerful tool that it always has been. Ingest time evals allow you to write an EVAL formula that is executed at ingestion time to create a new indexed field or to update a field’s value. They give you more control over Splunk index time fields as well. In my particular case, having control over and being able to manipulate index time fields helped me just do the trick for my customer.
For starters, _time is an index time field that is parsed from the raw log event. If the event does not have a time, the indexer will assign it with a current time when the event is ingested. In my particular challenge, the _time field needed a fixed offset by five hours as it was five hours ahead of eastern time.
To setup ingest time evals, we have to work with transforms.conf, props.conf, and fields.conf (only if creating new fields at ingest time). To further elaborate on the process of setting up ingest time evals to create new index time fields or manipulate existing fields at index time, we have used a sample log from a Cisco device.
To do a comparison, I ingested the log file with a custom sourcetype I created to parse the events.
With the above sourcetype, the following events were ingested.
If you look closely, the date/time was parsed exactly as it appears in the raw log event. Now if the raw event had a timestamp that needed to be offset, we could change the _time field at ingest time using ingest time eval.
To make my required changes, I will have to add an INGEST_EVAL expression in a transforms stanza in transforms.conf to update the _time field at ingest time after it has been parsed out from the actual event.
In the above example, I have used INGEST_EVAL to update my _time field to add 7200 seconds to it. This translates into 2 hours. I have also used the “:=” instead of “=” so that Splunk updates the _time field and not create another _time value resulting in a multivalued _time field in the final event. In this case, “:=” will overwrite the existing value in the field.
The above screenshot shows the updated _time field after the same log file has been ingested with the updated props and transforms. If you closely look at the Time column in the above screenshot in the first event it shows the timestamp being parsed as 01/16/20 1:43:43 PM but the timestamp in the event is 01/16/2020 11:43:31 AM. This tells us that the INGEST_EVAL expression in our transforms.conf successfully worked.
At this point, I would caution you to thoroughly test your INGEST_EVAL on a dev Splunk server so that you are sure that your eval works.
Ingest time eval can also be used to create new index time fields. While updating the _time field to offset the time difference, I thought about creating some custom index fields for demonstration purposes. This would further demonstrate how powerful ingest time evals are and how they can be useful.
Considering I was updating the _time field with my new timestamp, I figured it would be good to have a field that still parses and stores the original time. I named that field orig_time. This field is basically derived from the original _time field that was parsed before it was changed into the new timestamp.
I also thought it would be good to calculate the raw length of the event at ingest time, as that would create a field for me to calculate the size of the ingested data later. I particularly leaned towards demonstrating this, because not too long ago, I was also faced with the challenge to report host-level licensing information for every index. This helps Splunk users in an organization understand how much data their hosts are sending to Splunk.
Now, this is an easy fix if your environment is small. In that case, you can use the license_usage.log file available in the _internal index to calculate your license usage by index, sourcetype, source, or host. It definitely does become a problem when your environment grows too large. When the unique tuples cross 2000 by default, the license manager starts squashing source/host values and only index, sourcetype values remain in license_usage.log.
To work with this issue, I set up a daily license usage search which calculates the length of _raw for the past day for all the indexes and stores it in a summary index. This search runs at off-peak hours when the system is not being used by other users. That helps me populate my dashboards on demand for the users who want to see this data the next day.
Having raw event size calculated for every event at index size will definitely help me rid myself of those expensive searches that need to be run every night, these searches can be less reliable in case the search head that runs the summary generating search crashes. At index time I create a new field “event_size” using INGEST_EVAL in transforms.conf. The settings used to do this are as below;
If you look closely at the settings;
I have added two new stanzas to the transforms.conf to create the evals for the new fields, orig_time, and event_size.
As we are creating two new fields at ingest time, we add their names as stanzas in fields.conf and make sure these fields are indexed by adding the parameter “INDEXED = true”.
I have updated the TRANSFORMS parameter in the relevant sourcetype. If you notice, the order of the TRANSFORMS stanza actually dictates which transform will be applied first to the data being parsed. In this particular case, the stanza is:
TRANSFORMS = orig-time,time-offset,event-size
In this specific transform the order will be as follows:
- orig-time will preserve the original parsed time into the orig_time
- time-offset will update the existing _time field to be offset by 02 hours.
- event-size will calculate the total length of the event and create a new event_size field.
If you look at the final screenshot (above) closely on the left under “Interesting Fields” you will see that there are two new fields that you can see. These include orig_time and event_size
Now to calculate the total license usage by any measure, you can use your event_size with a | tstats search which will be many folds faster than a regular search.
There can be many other uses for Ingest Time Evals, one of which is listed on the documents page. To find out more, please visit Splunk documentation at https://docs.splunk.com/Documentation/Splunk/8.0.2/Data/IngestEval#Why_use_ingest-time_eval.3F
If you want to learn more or have TekStream help with implementing some Splunk use cases, contact us today!