Deep Freeze Your Splunk Data in AWS, Part 2

Caroline Lea
July 29, 2021
02:21 pm

By: Zubair Rauf | Senior Splunk Consultant, Team Lead

In Part 1 of this blog post, we touched on the need to freeze Splunk data in AWS S3. In that post, we described how to do this using a script to move the Splunk bucket into S3. In this post, we will describe how to accomplish the same result by mounting the S3 bucket on every indexer using S3FS-Fuse, then telling Splunk to just move the bucket to that mountpoint directly. S3FS is a package available in the EPEL Repository. EPEL (Extra Packages for Enterprise Linux) is a repository that provides additional packages for Linux from the Fedora sources.

High Level Process

Install S3FS-Fuse
Mount S3 Bucket using S3FS to a chosen mountpoint
Make the mountpoint persistent by updating the rc.local script
Update the index to use ColdToFrozenDir when freezing data
Verify frozen data exists in S3 bucket

Dependencies and Installation

The following packages are required to make this work:

Package	Repository
S3FS-Fuse	epel
dependency: fuse	amzn2-core
dependency: fuse-lib	amzn2-core

Note: This test was done on instances running Centos-7 and did not have the EPEL repo added for yum. Therefore, we had to install that as well before proceeding with the S3FS-Fuse installation.

Install S3FS-Fuse

The following commands (can also be scripted once they are tested to work in your environment) were used to install EPEL and S3FS-fuse on test indexers. You have to run these as root on the indexer hosts.

cd /tmp
wget https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
yum -y install ./epel-release-latest-7.noarch.rpm rpm --import https://download.fedoraproject.org/pub/epel/RPM-GPG-KEY-EPEL-6 yum -y install s3fs-fuse

Mounting the S3 Bucket

Use the following commands to mount the S3 Bucket to a new folder called frozen-s3 in /opt/splunk/data.

Note: This method uses the passwd-s3fs file to access the S3 bucket. Please ensure that the AWS credentials you use have access to the S3 bucket. The AWS credentials need to be a user role with an Access Key and Secret Key generated. I created a user, ‘splunk_access,’ which has a role, ‘splunk-s3-archival,’ attached with it. This role has explicit permissions to access my test S3 bucket.

The S3 bucket has the following JSON policy attached with it which gives the ‘splunk-s3-archival’ role full access to the bucket. The account_id in the policy is your 12-digit account number.

{

"Version": "2012-10-17",

"Id": "Policy1607555060391",

"Statement": [

{

"Sid": "Stmt1607555054806",

"Effect": "Allow",

"Principal": {

"AWS": "arn:aws:iam:::role/splunk-s3-archival" },

"Action": "s3:*",

"Resource": "arn:aws:s3:::splunk-to-s3-frozen-demo"

}

]
}

The following commands should be run as root on the server. Please make sure to update the following variables listed inside < > in the commands with your respective values:

splunk_user → The user that Splunk will run as
aws_region → The AWS region your S3 bucket was created in
bucket_name → S3 bucket name
mount_point → Path to the directory where the S3 bucket will be mounted

These commands can be run on one indexer manually to test in your environment and scripted for the remaining indexers.

cd /opt/splunk/data
mkdir frozen-s3 cd /opt/splunk/data/frozen-s3 sudo vi /home//.s3fs/passwd-s3fs ## Add AWS Access_key:Secret_key in this file sudo chmod 600 /home//.s3fs/passwd-s3fs su -c 's3fs -d -o passwd_file=/home//.s3fs/passwd-s3fs,allow_other,endpoint= ' echo su -c 's3fs -d -o passwd_file=/home//.s3fs/passwd-s3fs,allow_other,endpoint=us-east-2 ' >> /etc/rc.d/rc.local chmod +x /etc/rc.d/rc.local

Adding the mount command to the rc.local script ensures that the rc.local script mounts the S3 bucket on boot.
Once you have manually mounted the S3 bucket, you can use the following command to verify the bucket has mounted successfully.

df -h ## I get the following output from df -h

[centos@s3test ~]$ df -h Filesystem Size Used Avail Use% Mounted on devtmpfs 485M 0 485M 0% /dev tmpfs 495M 0 495M 0% /dev/shm tmpfs 495M 6.8M 488M 2% /run tmpfs 495M 0 495M 0% /sys/fs/cgroup /dev/xvda2 10G 3.3G 6.8G 33% / tmpfs 99M 0 99M 0% /run/user/1001 s3fs 256T 0 256T 0% /opt/splunk/data/frozen-s3 tmpfs 99M 0 99M 0% /run/user/1000 tmpfs 99M 0 99M 0% /run/user/0

The bolded mount is the s3 bucket mounted using S3FS.

Setting Up Test Index

Create a test index with the following settings and push it out to all indexers through the Cluster Master:

[s3-test] homePath = $SPLUNK_DB/s3-test/db coldPath = $SPLUNK_DB/s3-test/colddb thawedPath = $SPLUNK_DB/s3-test/thaweddb frozenTimePeriodInSecs = 600 maxDataSize = 10 maxHotBuckets = 1 maxWarmDBCount = 1 coldToFrozenDir = $SPLUNK_HOME/data/frozen-s3/_index_name

The coldToFrozenDir parameter in the above stanza defined where Splunk should freeze the data for this index. This needs to be set for every index you wish to freeze. Splunk will automatically replace the _index_name variable from the coldToFrozenDir parameter to the index name (S3-test) in this case. This makes it easier to copy the parameter to multiple individual index stanzas.

For testing purposes, the frozenTimePeriodInSecs, maxDataSize, maxHotBuckets, and maxWarmDBCount have been set very low. This is to ensure that the test index rolls data fast. In production, these values should either be left as the default or changed in consultation with a Splunk Architect.

The index needs to be setup/updated on the Cluster Master and configurations pushed out to all indexers.

Adding and Freezing Data

Once the indexers are restarted with the correct settings for index, upload sample data from the UI. Keep adding sample data files to the index till the index starts to roll over hot and warm buckets to cold and eventually frozen. Eventually, you will start to see your frozen data present in your S3 bucket.

Want to learn more about Freezing Data to S3 with Splunk? Contact us today!