Note that for this to work we will use the following components:
These components will be connected as described in the diagram below.
We will deploy the RabbitMQ Prometheus exporter using Helm with the following command:
This Helm Chart deploy:
You can check that they are available:
And then locally expose the service in order to check that the metrics are available:
This will also give you an idea of which metrics are available (http://localhost:9419/metrics).
Note: The latest versions of RabbitMQ also offer a plugin that automatically exposes these metrics. If you use this plugin then you shouldn’t need the exporter.
These metrics have the proper format that will allow Prometheus to use them. Let’s deploy it then.
Prometheus can be deployed in many ways (e.g. with Prometheus exporter), but in this case, we will use a simple deployment.
First, before actually deploying Prometheus, we need to deploy its configuration.
And since we will use the ability of Prometheus to perform Kubernetes Service discovery, we will need two things:
Now we can deploy all the above-mentioned resources and check their availability:
Now make your Prometheus deployment locally available and check the metrics with your browser:
Now if you go there: http://localhost:9090/service-discovery
You will find out that Prometheus did discover a lot of Kubernetes Services (that’s a good start!) but none of them are active. Why?
Because if you have a look at Prometheus ConfigMap above you will see that a Kubernetes Service discovered by Prometheus need a few annotations in order for Prometheus to scrape it (« scraping » meaning getting metrics from a service):
This means that we need to patch RabbitMQ exporter Service before you can see its metrics in Prometheus, for example with the following method:
If you restart Prometheus Pod and reload the port-forward and then the URL (http://localhost:9090/service-discovery) it should be better now:
Now that we have a Prometheus collecting RabbitMQ metrics we can use the integration between Prometheus and Stackdriver in order to use the metrics in Stackdriver. This integration is performed using a sidecar container called stackdriver-sidecar-container.
Note: You need to be really careful in this section because the following points are harder to debug
So we will use the Prometheus deployment and add the sidecar definition to it:
There are two things to notice here:
--stackdriver.project-id
: GCP project ID--prometheus.wal-directory
: will content wal files written by Prometheus--stackdriver.kubernetes.location
: GCP region--stackdriver.kubernetes.cluster-name
: GKE cluster name--stackdriver.generic.location
: --stackdriver.generic.namespace
: Here we put the name of the cluster because unfortunately in this case this will be the only field that will allow us to distinguish metrics from different clustersYou can now update the Prometheus deployment with:
kubectl apply -f deployment.yaml
And then go Stackdriver to see if some new metrics have arrived:
Note: using Prometheus v2.11.2 + stackdriver-prometheus-sidecar v0.6.3 with this configuration, RabbitMQ metrics will be recognised as « Generic_Task »
You can see your metrics with some filters as in the example below
Now let’s imagine that along with RabbitMQ you have another service that exposes Prometheus-compliant metrics through a Kubernetes Service. Maybe it’s your own application.
Then if you want those metrics to be available in stackdriver, you just need to patch your service by using the same kind of patch script we used before.
At this stage, exploiting the metrics is now easy. You just need to go to the Stackdriver metric explorer:
Then configure a chart with the following:
And when your chart is ready, save it to a new/existing Dashboard.
Be careful when sending external metrics in Stackdriver, even if it’s for a test because this is an expensive feature.
Stackdriver monitoring is not that expensive. Even native Kubernetes monitoring with Stackdriver is not expensive. But using external metrics is.
So my advice here would be to add services monitoring one by one and maybe not to send to Stackdriver metrics that are not essential.
As of today, I wouldn't say that getting external metrics associated with Kubernetes Services in Stackdriver is smooth. It requires some work, it doesn’t exactly fit the data model and if some debug is required then it may prove to be difficult debug.
Yet it exists, it works and will hopefully get easier to use.
And it prevents you from dealing with storage (amount of storage, resiliency, …) and cluster aggregation which are a lot trickier problems than what we faced here.
If however you still want to try out other tools, you can read those posts about Kubernetes monitoring. or Kubernetes productivity tips.