Deep Freeze Your Splunk Data in AWS, Part 2

      By: Zubair Rauf  |  Senior Splunk Consultant, Team Lead

In Part 1 of this blog post, we touched on the need to freeze Splunk data in AWS S3. In that post, we described how to do this using a script to move the Splunk bucket into S3. In this post, we will describe how to accomplish the same result by mounting the S3 bucket on every indexer using S3FS-Fuse, then telling Splunk to just move the bucket to that mountpoint directly. S3FS is a package available in the EPEL Repository. EPEL (Extra Packages for Enterprise Linux) is a repository that provides additional packages for Linux from the Fedora sources.

High Level Process

  1. Install S3FS-Fuse
  2. Mount S3 Bucket using S3FS to a chosen mountpoint
  3. Make the mountpoint persistent by updating the rc.local script
  4. Update the index to use ColdToFrozenDir when freezing data
  5. Verify frozen data exists in S3 bucket

Dependencies and Installation

The following packages are required to make this work:

Package Repository
S3FS-Fuse epel
dependency: fuse amzn2-core
dependency: fuse-lib amzn2-core

Note: This test was done on instances running Centos-7 and did not have the EPEL repo added for yum. Therefore, we had to install that as well before proceeding with the S3FS-Fuse installation.

Install S3FS-Fuse

The following commands (can also be scripted once they are tested to work in your environment) were used to install EPEL and S3FS-fuse on test indexers. You have to run these as root on the indexer hosts.

cd /tmp
wget https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
yum -y install ./epel-release-latest-7.noarch.rpm
rpm --import http://download.fedoraproject.org/pub/epel/RPM-GPG-KEY-EPEL-6
yum -y install s3fs-fuse

Mounting the S3 Bucket

Use the following commands to mount the S3 Bucket to a new folder called frozen-s3 in /opt/splunk/data.

Note: This method uses the passwd-s3fs file to access the S3 bucket. Please ensure that the AWS credentials you use have access to the S3 bucket. The AWS credentials need to be a user role with an Access Key and Secret Key generated. I created a user, ‘splunk_access,’ which has a role, ‘splunk-s3-archival,’ attached with it. This role has explicit permissions to access my test S3 bucket.

The S3 bucket has the following JSON policy attached with it which gives the ‘splunk-s3-archival’ role full access to the bucket. The account_id in the policy is your 12-digit account number.

{

"Version": "2012-10-17",

"Id": "Policy1607555060391",

"Statement": [

{

"Sid": "Stmt1607555054806",

"Effect": "Allow",

"Principal": {

"AWS": "arn:aws:iam:::role/splunk-s3-archival"
},

"Action": "s3:*",

"Resource": "arn:aws:s3:::splunk-to-s3-frozen-demo"

}

]
}

The following commands should be run as root on the server. Please make sure to update the following variables listed inside < > in the commands with your respective values:

  • splunk_user → The user that Splunk will run as
  • aws_region → The AWS region your S3 bucket was created in
  • bucket_name → S3 bucket name
  • mount_point → Path to the directory where the S3 bucket will be mounted

These commands can be run on one indexer manually to test in your environment and scripted for the remaining indexers.


cd /opt/splunk/data

mkdir frozen-s3
cd /opt/splunk/data/frozen-s3
sudo vi /home//.s3fs/passwd-s3fs ## Add AWS Access_key:Secret_key in this file
sudo chmod 600 /home//.s3fs/passwd-s3fs
su -c 's3fs -d -o passwd_file=/home//.s3fs/passwd-s3fs,allow_other,endpoint= '
echo su -c 's3fs -d -o passwd_file=/home//.s3fs/passwd-s3fs,allow_other,endpoint=us-east-2 ' >> /etc/rc.d/rc.local
chmod +x /etc/rc.d/rc.local

Adding the mount command to the rc.local script ensures that the rc.local script mounts the S3 bucket on boot.
Once you have manually mounted the S3 bucket, you can use the following command to verify the bucket has mounted successfully.


df -h
## I get the following output from df -h

[centos@s3test ~]$ df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 485M 0 485M 0% /dev
tmpfs 495M 0 495M 0% /dev/shm
tmpfs 495M 6.8M 488M 2% /run
tmpfs 495M 0 495M 0% /sys/fs/cgroup
/dev/xvda2 10G 3.3G 6.8G 33% /
tmpfs 99M 0 99M 0% /run/user/1001
s3fs 256T 0 256T 0% /opt/splunk/data/frozen-s3
tmpfs 99M 0 99M 0% /run/user/1000
tmpfs 99M 0 99M 0% /run/user/0

The bolded mount is the s3 bucket mounted using S3FS.

Setting Up Test Index

Create a test index with the following settings and push it out to all indexers through the Cluster Master:

[s3-test]
homePath = $SPLUNK_DB/s3-test/db
coldPath = $SPLUNK_DB/s3-test/colddb
thawedPath = $SPLUNK_DB/s3-test/thaweddb
frozenTimePeriodInSecs = 600
maxDataSize = 10
maxHotBuckets = 1
maxWarmDBCount = 1
coldToFrozenDir = $SPLUNK_HOME/data/frozen-s3/_index_name

The coldToFrozenDir parameter in the above stanza defined where Splunk should freeze the data for this index. This needs to be set for every index you wish to freeze. Splunk will automatically replace the _index_name variable from the coldToFrozenDir parameter to the index name (S3-test) in this case. This makes it easier to copy the parameter to multiple individual index stanzas.

For testing purposes, the frozenTimePeriodInSecs, maxDataSize, maxHotBuckets, and maxWarmDBCount have been set very low. This is to ensure that the test index rolls data fast. In production, these values should either be left as the default or changed in consultation with a Splunk Architect.

The index needs to be setup/updated on the Cluster Master and configurations pushed out to all indexers.

Adding and Freezing Data

Once the indexers are restarted with the correct settings for index, upload sample data from the UI. Keep adding sample data files to the index till the index starts to roll over hot and warm buckets to cold and eventually frozen. Eventually, you will start to see your frozen data present in your S3 bucket.

Want to learn more about Freezing Data to S3 with Splunk? Contact us today!