Making Your List and Checking It Twice? Sanity Reviewing Your Otel Collector Configuration
By William Phelps, Senior Technical Architect
It’s very hard to turn on a television set during the Christmas season and not see National Lampoon’s Christmas Vacation playing on one or more channels.
One of the signature moments of the film is that one moment when Clark Griswold’s light display finally illuminates. Could you really imagine having to check every single light in every single strand for a bad bulb? Shouldn’t the first order of business in the exercise be “check bulbs before hanging the light strings”?
Oftentimes the same lack of any checklist validation occurs with an Otel collector deployment. All steps were supposedly followed, yet that same look of despair and frustration appears on our face, just like the one that Clark had when he plugged the last two cords together and nothing happened.
Because the Otel collector is typically installed and/or deployed as part of a multiple component process, it’s prudent to verify individual components along the way. In this fashion, if the Otel collector is not outputting data correctly, it can be much simpler to directly point at the Otel collector configuration as the culprit.
Consider a very basic setup using a Linux server and some flavor of containerization (Docker and Kubernetes), along with an application that’s deployed via Helm chart. This is four rather large and wide-ranging buckets to address before even considering the Otel collector. Are you sure that everything in these buckets is functioning correct prior to configuring the Otel Collector?
Can the server connect to the internet? Try a simple curl to a public website. (You did install curl, right?)
curl checkip.amazonaws.com
Immediately out of the gate, simply being able to connect to a random external website eliminates basic firewall and routing issues from the server through your network to the Internet. This test should always be “test number 1”, especially if the public IP of the server is needed.
Assuming that your internet connectivity is good, and that you have installed Docker, the next test is to make sure that Docker is running and returning basic information.
systemctl docker status
docker ps
docker info
Once Docker is confirmed, next confirm that Kubernetes is functional, first at a base level, and then your deployment health. The idea here is to validate that the pieces ultimately being monitored are indeed working as expected before thinking the issue is with the collector installation.
kubectl get pods –all-namespaces (checking for statuses of RUNNING and COMPLETED would be optimal)
kubectl get deploy,po,ing,svc -n <your application namespace>
For Kubernetes, the deployment process for the collector is via a Helm chart. Make sure your version of helm is 3 or better.
helm version
If your application emits metrics via an endpoint, now make sure that the endpoint is active. The test would be to send a curl request to the application’s metrics endpoint. For example, a request to a somewhat typical Prometheus metrics endpoint would look like
curl http://localhost:9090/metrics.
At this point, if the previous steps are successful/satisfactory, login into the APM UI, and head over to Data Management. Find your Kubernetes flavor and follow the instructions for the base installation.
After the “basic” Helm install, now go and check the APM UI dashboards. Take the time to get a decent look at the initial installation results to refer to as the baseline.
After your baseline is confirmed as successful, then apply your desired changes via your custom yaml file. Make small edits, test them, and keep a good backup during the iterations. Deploy the Helm chart again with a -f option that points to your custom yaml file. Bonus note: if using multiple yaml files, be mindful of the order of declaration in the Helm deployment process. The yaml file content is merged based on the order in which they appear in the command line.
The intent of this article is a basic “how-to” that strives to simply illustrate a basic process. Please review the solution and add additional steps/checks/processes as applicable to your use case. This article is not intended as “production-ready” for any/all cases, but “bare bones” to cover the concept as concisely as possible for educational purposes.
Do you need someone to look over your list, perhaps check it twice? Contact us to let us take a look at your Otel configuration.
About the Author
William has over twenty-seven years of experience in the design, development, and implementation of web-based enterprise applications. He has worked for clients within the manufacturing, services, energy, and government industries. His areas of expertise include Content Server Web Content Management, Universal Records Management, Imaging, and Inspyrus. He has enjoyed tremendous success communicating complex technical subjects with the diverse population encountered in the typical business organization.
William has experience in WebCenter Content from versions 4.6 to 12c, possessing end-to-end, top-to-bottom implementation experience in gathering requirements, identifying gaps, and implementing the final solution. William has deep experience in customization of product to fit implementation needs. William also has extended experience in WebCenter Content Records from versions 7.5 to 11g, again possessing end-to-end, top-to-bottom implementation experience across a wide swath of industries. He is recognized by Oracle product management as one of the most knowledgeable and experienced resources for records management.
William has experience implementing WebCenter Content Imaging from version 10g to 11g, focusing on implementing accounts payable-centric solutions. He has deep experience with Oracle Document Capture/WebCenter Enterprise Capture and Oracle Forms Recognition. William is also Inspyrus certified on versions through 4.3.11, with implementation exposure with EBS R12, PeopleSoft, Oracle Fusion, and JDE.
William currently holds a Splunk Observability Consultant I certification, assisting clients to help troubleshoot issues and to maximize their observability investment.
