Posted on 27 November 2019, updated on 6 August 2021.
When deploying a managed Kubernetes cluster, GCP automatically provides integration with Stackdriver for the monitoring of its resources (e.g. nodes and pods resources).
But did you know that it’s also possible to have Stackdriver monitor the Prometheus metrics of other applications and even your own applications? We will explain how in the following sections, with the monitoring of a RabbitMQ cluster.
Overview
Note that for this to work we will use the following components:
- A GCP-hosted Stackdriver, with a workspace that receives metrics for project <GCP_PROJECT_ID>
- A Prometheus pod that will collect metrics and then send them to Stackdriver
- A RabbitMQ exporter compliant with Prometheus format (that could in fact be any other Prometheus exporter)
- Note: for this method to work, the exporter needs to be accessible through a Kubernetes Service
- A RabbitMQ cluster
These components will be connected as described in the diagram below.
Deploy the solution in a Kubernetes cluster
Deploy RabbitMQ exporter
We will deploy the RabbitMQ Prometheus exporter using Helm with the following command:
This Helm Chart deploy:
- A Pod that will connect to RabbitMQ in order to get various metrics
- A Service that will expose the metrics on port 9419 at URI /metrics
You can check that they are available:
And then locally expose the service in order to check that the metrics are available:
This will also give you an idea of which metrics are available (http://localhost:9419/metrics).
Note: The latest versions of RabbitMQ also offer a plugin that automatically exposes these metrics. If you use this plugin then you shouldn’t need the exporter.
These metrics have the proper format that will allow Prometheus to use them. Let’s deploy it then.
Deploy Prometheus and patch RabbitMQ exporter service
Prometheus can be deployed in many ways (e.g. with Prometheus exporter), but in this case, we will use a simple deployment.
First, before actually deploying Prometheus, we need to deploy its configuration.
And since we will use the ability of Prometheus to perform Kubernetes Service discovery, we will need two things:
- A ClusterRole and a ClusterRoleBinding that allow the Prometheus Pod to list and get Kubernetes Services description on the Kubernetes REST API
- A ConfigMap that provide Prometheus configuration
- And finally the Prometheus deployment configuration
Now we can deploy all the above-mentioned resources and check their availability:
Now make your Prometheus deployment locally available and check the metrics with your browser:
Now if you go there: http://localhost:9090/service-discovery
You will find out that Prometheus did discover a lot of Kubernetes Services (that’s a good start!) but none of them are active. Why?
Because if you have a look at Prometheus ConfigMap above you will see that a Kubernetes Service discovered by Prometheus need a few annotations in order for Prometheus to scrape it (« scraping » meaning getting metrics from a service):
This means that we need to patch RabbitMQ exporter Service before you can see its metrics in Prometheus, for example with the following method:
If you restart Prometheus Pod and reload the port-forward and then the URL (http://localhost:9090/service-discovery) it should be better now:
Deploy Stackdriver Prometheus sidecar
Now that we have a Prometheus collecting RabbitMQ metrics we can use the integration between Prometheus and Stackdriver in order to use the metrics in Stackdriver. This integration is performed using a sidecar container called stackdriver-sidecar-container.
Note: You need to be really careful in this section because the following points are harder to debug
So we will use the Prometheus deployment and add the sidecar definition to it:
There are two things to notice here:
- The Prometheus data volume is mounted in both the Prometheus and the sidecar containers because the sidecar will actually access the collected metrics using files written by Prometheus: Write-ahead-files (WAL). The sidecar can’t work without it.
- The options we gave to the sidecar container
--stackdriver.project-id
: GCP project ID--prometheus.wal-directory
: will content wal files written by Prometheus--stackdriver.kubernetes.location
: GCP region--stackdriver.kubernetes.cluster-name
: GKE cluster name--stackdriver.generic.location
:--stackdriver.generic.namespace
: Here we put the name of the cluster because unfortunately in this case this will be the only field that will allow us to distinguish metrics from different clusters
You can now update the Prometheus deployment with:
kubectl apply -f deployment.yaml
And then go Stackdriver to see if some new metrics have arrived:
- https://app.google.stackdriver.com/
- Go to your workspace
- Then go to “Resources” -> “Metrics Explorer”
Note: using Prometheus v2.11.2 + stackdriver-prometheus-sidecar v0.6.3 with this configuration, RabbitMQ metrics will be recognised as « Generic_Task »
You can see your metrics with some filters as in the example below
Adding another Kubernetes Service
Now let’s imagine that along with RabbitMQ you have another service that exposes Prometheus-compliant metrics through a Kubernetes Service. Maybe it’s your own application.
Then if you want those metrics to be available in stackdriver, you just need to patch your service by using the same kind of patch script we used before.
Use metrics in Stackdriver
At this stage, exploiting the metrics is now easy. You just need to go to the Stackdriver metric explorer:
- https://app.google.stackdriver.com/
- Go to your workspace
- Then go to “Resources” -> “Metrics Explorer”
Then configure a chart with the following:
- Resource type: Generic_Task
- Metric name: Just begin to type “rabbit” and stackdriver will try to autocomplete
And when your chart is ready, save it to a new/existing Dashboard.
Pricing
Be careful when sending external metrics in Stackdriver, even if it’s for a test because this is an expensive feature.
Stackdriver monitoring is not that expensive. Even native Kubernetes monitoring with Stackdriver is not expensive. But using external metrics is.
So my advice here would be to add services monitoring one by one and maybe not to send to Stackdriver metrics that are not essential.
As of today, I wouldn't say that getting external metrics associated with Kubernetes Services in Stackdriver is smooth. It requires some work, it doesn’t exactly fit the data model and if some debug is required then it may prove to be difficult debug.
Yet it exists, it works and will hopefully get easier to use.
And it prevents you from dealing with storage (amount of storage, resiliency, …) and cluster aggregation which are a lot trickier problems than what we faced here.
If however you still want to try out other tools, you can read those posts about Kubernetes monitoring. or Kubernetes productivity tips.