How Datamodel Works in Splunk ES

By Kamal Dorairaj, Senior Splunk Consultant

Datamodel is really like Savedsearches, providing structure to underlying unstructured data. Datamodel has multiple datasets, where datasets are like a table in the traditional database. In Splunk, when we create dataset, we create with some constrains. This blog walks through the end-to-end flow of a datamodel in Splunk ES. By the end of this demonstration, you will find the above definition makes sense.

Let’s take “Authentication” datamodel as an example:

Step One

Login into your Splunk ES search head. Navigate to Settings -> Data models and Click on the “Authentication” DM. We can see the macros and base constrain for the DM. Then click on the failed authentication. Now we can see new constrain on top of the base constrain. The base constrain does not change. The child constrains look into every index that has tag authentication defined in the base constrain and also the “action =success” defined in the child constrain.

Note down the macro used in the Authentication datamodel and proceed to Step Two.

Step Two

Go to ES -> Configure -> CIM Setup -> Click on the Authentication DM -> at the bottom -> Indexes are whitelist -> Only these indexes are scoped for the DM. I.E.: this is only data that is coming into this datamodel. Now, we have to ask ourselves, do all indexes included here have authentication events as desired? Is there anything that we want that is not added here? Please add/remove indexes according to your authentication log sources.

Step Three

Let’s say we are sending authentication logs to an index called “abc”, but we are not seeing any events in the Authentication DM.

Then let’s do a search on that index -> “index=abc | stats count by tag”

If we go back to the Authentication DM constrain, it uses action as one of the tags. But that tag is not available in the above “abc” index. It says that the base constrain for the DM is not met for this index.

Let’s do index=abc | stats count by action

We can see the values in the action field. It says we have to create tag, so we can see the events in the authentication datamodel. Create the tag authentication.

Step Four

All the TAs runs by eventtypes. Let’s go to Settings -> Eventtypes -> search for authentication, and sort by tags. When the accelerated Authentication DM runs, it looks for the tags and runs all the eventtypes related to the authentication tag. If we don’t have the eventtype that is looking for the tag authentication, then it will not give any data for the DM.

Let’s run -> “index=abc action=success”

It will show the list of sourcetype and check the sourcetype that got the data that we wanted.

Go back to the eventtypes -> Create new eventtype and in the search string -> sourcetype=<that we are looking for> and we can use index also in the search string.

Go to tag -> type authentication. Now when we search for eventtype it returns the data for that sourcetype. We don’t have to mention index in the eventtype but the more you scope its better.

Step Five

The above link provides all the ES Dashboards and the DM, the fields it depends on. If our raw data doesn’t have the same field name defined for the DM, then create a field alias for that. In general, we may not have all the fields in our data that is cim compliant.

We have to check the raw data and make a decision which of those fields may potentially map to a datamodel. The prescribed values are those values should be in for that particular field. No other values.

Step Six

a.) |datamodel Authentication

This shows if the datamodel is functional.

b.) |from datamodel:Authentication

This shows all the data in the datamodel. Click on the “users” field on the left side. We can see data for the “users” field. Check if all the data is extracted correctly. If we see ‘unknown’ then data is not extracted correctly. The fields we see in the left side are the fields that are part of the datamodel.

c.) | datamodel Authentication flat
| table *
| fields – date_* host index punct _raw time* splunk_server splunk_server sourcetype source eventtype linecount
| fieldsummary

This uses fieldsummary with the datamodel search. It groups all the values for each field. It helps to find the list of fields in the DM that got “unknown” values.

d.) |datamodel Authentication search
|datamodel Authentication acceleration_search_string

This is what the accelerated dm search looks like.

Step Seven

a.) When we search only the sourcetype and if it’s not returning anything then because the user not having permission to search all the indexes by default. So it’s better to search index=* sourcetype=abc… -> then it looks for the indexes that use has permissions.
b.) Review the permissions on the eventtype and tags and make sure its global.
c.) If there is any bundle problem, then creating tag/eventtype won’t reflect immediately. It’s hard to troubleshoot when your bundle is broken. We will cover this topic in a separate blog.

Summary

Configuring DM with proper scoping as explained helps your SVC cost in the cloud environment. When we do Index whitelist and Tag whitelist, we are running specific tags only on specific indexes. Please follow the same steps for all the other datamodels in your environment. If you are on ingest model then it would not cost money but when users are accessing the ES dashboard, it would be slow loading and cause performance problems.

Please contact us here if you have any questions!