How to Enable Splunk Boot-Start Using Systemd

      By: Jon Walthour  |  Senior Splunk Consultant, Team Lead

 

Splunk switched to a default of enabling boot-start to systemd back in 7.2.2. It did this because using systemd has become the default system initialization and service manager for most major Linux distros. They switched back to using SysV init in version 7.3 to 8.1.0 because of shortcomings in how Splunk was utilizing systemd for service startup and shutdown. Since the startup and shutdown actions prompted for root credentials, this broke many automated processes out in the wild. It wasn’t until version 8.1.1 that an option was added to the “enable boot-start” command to install “Polkit” rules to grant non-root users like “Splunk” to have a certain level of centralized system control to allow for the starting and stopping of the Splunk systemd service. Starting with version 8.1.1, the preferred method for setting up boot-start for Splunk Enterprise is via systemd.

What are the advantages of using systemd, you might ask? Plenty. First, systemd offers parallel processing to allow more to be done concurrently during system boot-up. Additionally, it allows for a standard framework for expressing dependencies between processes. This means, in the case of the Splunk systemd initialization, Splunk’s startup can be dependent on network services starting successfully. The configuration of systemd is standardized with unit text files and does require the creation of custom scripts.

Systemd also offers enhancements specifically to Splunk in that it provides a way to monitor and manage the splunkd service independent of Splunk itself. It provides tools for debugging and troubleshooting boot-time and service-related issues with Splunk — again, independent of the Splunk software itself. Most importantly, systemd allows for the use of Linux control groups (cgroups), which forms the backbone of the workload management features in Splunk Enterprise.

Below are the steps to enable Splunk to start at system boot under systemd as well as other recommended operating system configurations for Splunk:

1. Install Polkit (if not already installed).

sudo su -
yum -y update
yum -y install polkit

sudo /opt/splunk/bin/splunk enable boot-start -systemd-managed 1 -systemd-unit-file-name splunk -create-polkit-rules 1 -user splunk -group splunk

NOTE: If you get message “CAUTION: The system has systemd version < 237 and polkit version > 105. With this combination, polkit rule created for this user will enable this user to manage all systemd services. Are you sure you want to continue [y/n]?”, select “y,” then create the following two files and run the following chmod command:

vi /etc/polkit-1/rules.d/10-Splunkd.rules

polkit.addRule(function(action, subject) {
     if (action.id {== "org.freedesktop.systemd1.manage-units" &&
     subject.user ==} "splunk") {
     try {
         polkit.spawn(["/usr/local/bin/polkit_splunk", ""+subject.pid]);
         return polkit.Result.YES;
     } catch (error) {
         return polkit.Result.AUTH_ADMIN;
     }
     }
});

vi /usr/local/bin/polkit_splunk

#!/bin/bash -x
COMM=($(ps --no-headers -o cmd -p $1))

if [[ "${COMM[1]}" {== "start" ]] ||
     [[ "${COMM[1]}" ==} "stop" ]] ||
     [[ "${COMM[1]}" == "restart" ]]; then

         if [[ "${COMM[2]}" {== "Splunkd" ]] ||
            [[ "${COMM[2]}" ==} "Splunkd.service" ]]; then
                 exit 0
         fi
fi

exit 1

chmod 755 /usr/local/bin/polkit_splunk

2. Edit the splunk.service file and make the following adjustments:

vi /etc/systemd/system/splunk.service

File created in /etc/systemd/system/splunk.service:
#This unit file replaces the traditional start-up script for systemd
#configurations, and is used when enabling boot-start for Splunk on
#systemd-based Linux distributions.

[Unit]
Description=Systemd service file for Splunk, generated by 'splunk enable boot-start'
After=network.target

[Service]
Type=simple
Restart=always
ExecStart=/opt/splunk/bin/splunk _internal_launch_under_systemd
KillMode=mixed
KillSignal=SIGINT
TimeoutStopSec=600

LimitCORE=0


LimitDATA=infinity


LimitNICE=0


LimitFSIZE=infinity


LimitSIGPENDING=385952


LimitMEMLOCK=65536


LimitRSS=infinity


LimitMSGQUEUE=819200


LimitRTPRIO=0


LimitSTACK=infinity


LimitCPU=infinity


LimitAS=infinity


LimitLOCKS=infinity


LimitNOFILE=1024000


LimitNPROC=512000


TasksMax=infinity

SuccessExitStatus=51 52
RestartPreventExitStatus=51
RestartForceExitStatus=52
User=splunk

Group=splunk

Delegate=true
CPUShares=1024
MemoryLimit=<value>
PermissionsStartOnly=true
ExecStartPost=/bin/bash -c "chown -R splunk:splunk

/sys/fs/cgroup/cpu/system.slice/%n"
ExecStartPost=/bin/bash -c "chown -R splunk:splunk

/sys/fs/cgroup/memory/system.slice/%n"

[Install]
WantedBy=multi-user.target

Change or check the following settings in the splunk.service file:

  • – Change TimeoutStopSec to 600
  • – Add all Limit____ lines and TasksMax
  • – Check user and group
  • – Set MemoryLimit to the total system memory available in bytes
  • – Check both “ExecStartPost” chown is right user:group

3. Add a systemd service for disabling THP:

[Unit]
Description=Disable Transparent Huge Pages (THP)

[Service]
Type=simple
ExecStart=/bin/sh -c "echo 'never' > /sys/kernel/mm/transparent_hugepage/enabled && echo 'never' > /sys/kernel/mm/transparent_hugepage/defrag"

[Install]
WantedBy=multi-user.target

4. Finally, enable the whole thing and reboot:

sudo systemctl daemon-reload

sudo systemctl start disable-thp
sudo systemctl enable disable-thp

sudo systemctl start splunk
sudo systemctl enable splunk

shutdown -r now

5. Lastly, check your work to ensure (a) Splunk was started under systemd and (b) transparent huge pages is disabled and ulimits are set according to the values defined in the systemd init file.

$ ps -ef|grep splunkd
splunk 3848 1 3 12:10 ? 00:00:04 splunkd --under-systemd --systemd-delegate=yes -p 8089 _internal_launch_under_systemd
splunk 4731 3848 0 12:10 ? 00:00:00 [splunkd pid=3848] splunkd --under-systemd --systemd-delegate=yes -p 8089 _internal_launch_under_systemd [process-runner]
splunk 4931 4731 0 12:10 ? 00:00:00 /opt/splunk/bin/splunkd instrument-resource-usage -p 8089 --with-kvstore
splunk 5504 5442 0 12:12 pts/0 00:00:00 grep --color=auto splunkd

$ splunk status
splunkd is running (PID: 3848).
splunk helpers are running (PIDs: 4731 4749 4853 4931).

cat /opt/splunk/var/log/splunk/splunkd.log | grep ulimit

06-01-2021 12:05:17.513 +0000 INFO ulimit - Limit: virtual address space size: unlimited
06-01-2021 12:05:17.513 +0000 INFO ulimit - Limit: data segment size: unlimited
06-01-2021 12:05:17.513 +0000 INFO ulimit - Limit: resident memory size: unlimited
06-01-2021 12:05:17.513 +0000 INFO ulimit - Limit: stack size: unlimited
06-01-2021 12:05:17.513 +0000 INFO ulimit - Limit: core file size: 0 bytes
06-01-2021 12:05:17.513 +0000 WARN ulimit - Core file generation disabled.
06-01-2021 12:05:17.513 +0000 INFO ulimit - Limit: data file size: unlimited
06-01-2021 12:05:17.513 +0000 INFO ulimit - Limit: open files: 1024000 files
06-01-2021 12:05:17.513 +0000 INFO ulimit - Limit: user processes: 512000 processes
06-01-2021 12:05:17.513 +0000 INFO ulimit - Limit: cpu time: unlimited
06-01-2021 12:05:17.513 +0000 INFO ulimit - Linux transparent hugepage support, enabled="never" defrag="never"
06-01-2021 12:05:17.513 +0000 INFO ulimit - Linux vm.overcommit setting, value="0"
06-01-2021 12:10:55.997 +0000 INFO ulimit - Limit: virtual address space size: unlimited
06-01-2021 12:10:55.997 +0000 INFO ulimit - Limit: data segment size: unlimited
06-01-2021 12:10:55.997 +0000 INFO ulimit - Limit: resident memory size: unlimited
06-01-2021 12:10:55.997 +0000 INFO ulimit - Limit: stack size: unlimited
06-01-2021 12:10:55.997 +0000 INFO ulimit - Limit: core file size: 0 bytes
06-01-2021 12:10:55.997 +0000 WARN ulimit - Core file generation disabled.
06-01-2021 12:10:55.997 +0000 INFO ulimit - Limit: data file size: unlimited
06-01-2021 12:10:55.997 +0000 INFO ulimit - Limit: open files: 1024000 files
06-01-2021 12:10:55.997 +0000 INFO ulimit - Limit: user processes: 512000 processes
06-01-2021 12:10:55.997 +0000 INFO ulimit - Limit: cpu time: unlimited
06-01-2021 12:10:55.997 +0000 INFO ulimit - Linux transparent hugepage support, enabled="never" defrag="never"
06-01-2021 12:10:55.997 +0000 INFO ulimit - Linux vm.overcommit setting, value="0"

Want to learn more about Splunk boot-start and systemd? Contact us today!