Posted on 27 November 2019, updated on 6 August 2021.

When deploying a managed Kubernetes cluster, GCP automatically provides integration with Stackdriver for the monitoring of its resources (e.g. nodes and pods resources).

But did you know that it’s also possible to have Stackdriver monitor the Prometheus metrics of other applications and even your own applications? We will explain how in the following sections, with the monitoring of a RabbitMQ cluster.

Overview

Note that for this to work we will use the following components:

  • A GCP-hosted Stackdriver, with a workspace that receives metrics for project <GCP_PROJECT_ID>
  • A Prometheus pod that will collect metrics and then send them to Stackdriver
  • A RabbitMQ exporter compliant with Prometheus format (that could in fact be any other Prometheus exporter)
    • Note: for this method to work, the exporter needs to be accessible through a Kubernetes Service
  • A RabbitMQ cluster

These components will be connected as described in the diagram below.

prometheus-stackdriver

Deploy the solution in a Kubernetes cluster

Deploy RabbitMQ exporter

We will deploy the RabbitMQ Prometheus exporter using Helm with the following command:

# URL of RabbitMQ service from within the cluster
RABBITMQ_URL=
RABBITMQ_USER=
# Kubernetes Secret name that contains a "RABBIT_PASSWORD" key
RABBITMQ_SECRET=
helm upgrade -i prometheus-rabbitmq-exporter stable/prometheus-rabbitmq-exporter --set "rabbitmq.url=$RABBITMQ_URL,rabbitmq.user=$RABBITMQ_USER,rabbitmq.existingPasswordSecret=$RABBITMQ_SECRET"

This Helm Chart deploy:

  • A Pod that will connect to RabbitMQ in order to get various metrics
  • A Service that will expose the metrics on port 9419 at URI /metrics

You can check that they are available:

kubectl get po | grep prometheus-rabbitmq-exporter
kubectl get svc | grep prometheus-rabbitmq-exporter

And then locally expose the service in order to check that the metrics are available:

kubectl port-forward svc/prometheus-rabbitmq-exporter 9419
curl localhost:9419/metrics

This will also give you an idea of which metrics are available (http://localhost:9419/metrics).

Note: The latest versions of RabbitMQ also offer a plugin that automatically exposes these metrics. If you use this plugin then you shouldn’t need the exporter.

These metrics have the proper format that will allow Prometheus to use them. Let’s deploy it then.

Deploy Prometheus and patch RabbitMQ exporter service

Prometheus can be deployed in many ways (e.g. with Prometheus exporter), but in this case, we will use a simple deployment.

First, before actually deploying Prometheus, we need to deploy its configuration.

And since we will use the ability of Prometheus to perform Kubernetes Service discovery, we will need two things:

  1. A ClusterRole and a ClusterRoleBinding that allow the Prometheus Pod to list and get Kubernetes Services description on the Kubernetes REST API
  2. ---
    apiVersion: rbac.authorization.k8s.io/v1beta1
    kind: ClusterRole
    metadata:
    name: prometheus-rabbitmq-exporter
    rules:
    - apiGroups: [""]
    resources:
    - services
    verbs: ["get", "list", "watch"]
    ---
    apiVersion: rbac.authorization.k8s.io/v1beta1
    kind: ClusterRoleBinding
    metadata:
    name: prometheus-rabbitmq-exporter
    roleRef:
    apiGroup: rbac.authorization.k8s.io
    kind: ClusterRole
    name: prometheus-rabbitmq-exporter
    subjects:
    - kind: ServiceAccount
    name: default
    namespace: kube-system
    view raw role.yaml hosted with ❤ by GitHub
  3. A ConfigMap that provide Prometheus configuration
  4. apiVersion: v1
    kind: ConfigMap
    metadata:
    name: prometheus-server-conf
    labels:
    name: prometheus-server-conf
    data:
    prometheus.yml: |-
    global:
    scrape_interval: 60s
    evaluation_interval: 60s
    scrape_configs:
    - job_name: 'kubernetes-services'
    kubernetes_sd_configs:
    - role: service
    scheme: http
    metrics_path: /metrics
    relabel_configs:
    - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
    action: keep
    regex: true
    - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
    action: replace
    target_label: __scheme__
    regex: (https?)
    - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
    action: replace
    target_label: __metrics_path__
    regex: (.+)
    - source_labels: [__meta_kubernetes_namespace]
    action: replace
    target_label: _generic_namespace
    view raw configmap.yaml hosted with ❤ by GitHub
  5. And finally the Prometheus deployment configuration
  6. ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
    name: prometheus-deployment
    spec:
    replicas: 1
    selector:
    matchLabels:
    app: prometheus-server
    template:
    metadata:
    labels:
    app: prometheus-server
    spec:
    containers:
    - name: prometheus
    image: prom/prometheus:v2.11.2
    args:
    - "--config.file=/etc/prometheus/prometheus.yml"
    - "--storage.tsdb.path=/prometheus/"
    ports:
    - containerPort: 9090
    volumeMounts:
    - name: prometheus-config-volume
    mountPath: /etc/prometheus/
    - name: prometheus-storage-volume
    mountPath: /prometheus/
    volumes:
    - name: prometheus-config-volume
    configMap:
    defaultMode: 420
    name: prometheus-server-conf
    - name: prometheus-storage-volume
    emptyDir: {}
    view raw deployment.yaml hosted with ❤ by GitHub

Now we can deploy all the above-mentioned resources and check their availability:

      kubectl apply -a role.yaml
      kubectl get ClusterRole prometheus-rabbitmq-exporter
      kubectl get ClusterRole prometheus-rabbitmq-exporter
      kubectl apply -a configmap.yaml
      kubectl get prometheus-server-conf
      kubectl apply -a deployment.yaml
      kubectl get prometheus-deployment

Now make your Prometheus deployment locally available and check the metrics with your browser:

      kubectl port-forward $(kubectl get po -l app=prometheus-server -o name) 9090

Now if you go there: http://localhost:9090/service-discovery

You will find out that Prometheus did discover a lot of Kubernetes Services (that’s a good start!) but none of them are active. Why?

Because if you have a look at Prometheus ConfigMap above you will see that a Kubernetes Service discovered by Prometheus need a few annotations in order for Prometheus to scrape it (« scraping » meaning getting metrics from a service):

      apiVersion: v1
      kind: ConfigMap
      metadata:
      name: prometheus-server-conf
      data:
      ...
      relabel_configs:
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
      action: keep
      regex: true
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
      action: replace
      target_label: __scheme__
      regex: (https?)
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
      action: replace
      target_label: __metrics_path__
      regex: (.+)
      ...

This means that we need to patch RabbitMQ exporter Service before you can see its metrics in Prometheus, for example with the following method:

      kubectl -n "kube-system" patch service prometheus-rabbitmq-exporter --type strategic --patch "
      metadata:
      annotations:
      prometheus.io/path: '/metrics'
      prometheus.io/scheme: 'http'
      prometheus.io/scrape: 'true'
      "

If you restart Prometheus Pod and reload the port-forward and then the URL (http://localhost:9090/service-discovery) it should be better now:

Prometheus_service_discovery-kubernetes

Deploy Stackdriver Prometheus sidecar

Now that we have a Prometheus collecting RabbitMQ metrics we can use the integration between Prometheus and Stackdriver in order to use the metrics in Stackdriver. This integration is performed using a sidecar container called stackdriver-sidecar-container.

Note: You need to be really careful in this section because the following points are harder to debug

So we will use the Prometheus deployment and add the sidecar definition to it:

      apiVersion: apps/v1
      kind: Deployment
      metadata:
      name: prometheus-deployment
      spec:
      replicas: 1
      selector:
      matchLabels:
      app: prometheus-server
      template:
      metadata:
      labels:
      app: prometheus-server
      spec:
      containers:
      - name: prometheus
      image: prom/prometheus:v2.11.2
      args:
      - "--config.file=/etc/prometheus/prometheus.yml"
      - "--storage.tsdb.path=/prometheus/"
      ports:
      - containerPort: 9090
      volumeMounts:
      - name: prometheus-config-volume
      mountPath: /etc/prometheus/
      - name: prometheus-storage-volume
      mountPath: /prometheus/
      - name: sidecar-stackdriver
      image: gcr.io/stackdriver-prometheus/stackdriver-prometheus-sidecar:0.6.3
      imagePullPolicy: Always
      args:
      - "--stackdriver.project-id={{ .Values.gcp_project }}"
      - "--prometheus.wal-directory={{ .Values.data_dir }}/wal"
      - "--stackdriver.kubernetes.location={{ .Values.gcp_region }}"
      - "--stackdriver.kubernetes.cluster-name={{ .Values.kube_cluster }}"
      - "--stackdriver.generic.location={{ .Values.gcp_region }}"
      - "--stackdriver.generic.namespace={{ .Values.kube_cluster_name }}"
      ports:
      - name: sidecar
      containerPort: 9091
      volumeMounts:
      - name: {{ .Values.data_volume }}
      mountPath: {{ .Values.data_dir }}
      volumes:
      - name: prometheus-config-volume
      configMap:
      defaultMode: 420
      name: prometheus-server-conf
      - name: prometheus-storage-volume
      emptyDir: {}

There are two things to notice here:

      1. The Prometheus data volume is mounted in both the Prometheus and the sidecar containers because the sidecar will actually access the collected metrics using files written by Prometheus: Write-ahead-files (WAL). The sidecar can’t work without it.

      2. The options we gave to the sidecar container
        1. --stackdriver.project-id: GCP project ID
        2. --prometheus.wal-directory: will content wal files written by Prometheus
        3. --stackdriver.kubernetes.location: GCP region
        4. --stackdriver.kubernetes.cluster-name: GKE cluster name
        5. --stackdriver.generic.location
        6. --stackdriver.generic.namespace: Here we put the name of the cluster because unfortunately in this case this will be the only field that will allow us to distinguish metrics from different clusters

You can now update the Prometheus deployment with:

kubectl apply -f deployment.yaml

And then go Stackdriver to see if some new metrics have arrived:

Note: using Prometheus v2.11.2 + stackdriver-prometheus-sidecar v0.6.3 with this configuration, RabbitMQ metrics will be recognised as « Generic_Task »

You can see your metrics with some filters as in the example below

Stackdriver_metrics_explorer

Adding another Kubernetes Service

Now let’s imagine that along with RabbitMQ you have another service that exposes Prometheus-compliant metrics through a Kubernetes Service. Maybe it’s your own application.

Then if you want those metrics to be available in stackdriver, you just need to patch your service by using the same kind of patch script we used before.

Use metrics in Stackdriver

At this stage, exploiting the metrics is now easy. You just need to go to the Stackdriver metric explorer:

Then configure a chart with the following:

      • Resource type: Generic_Task
      • Metric name: Just begin to type “rabbit” and stackdriver will try to autocomplete

And when your chart is ready, save it to a new/existing Dashboard.

Pricing

Be careful when sending external metrics in Stackdriver, even if it’s for a test because this is an expensive feature.

Stackdriver monitoring is not that expensive. Even native Kubernetes monitoring with Stackdriver is not expensive. But using external metrics is.

So my advice here would be to add services monitoring one by one and maybe not to send to Stackdriver metrics that are not essential.

As of today, I wouldn't say that getting external metrics associated with Kubernetes Services in Stackdriver is smooth. It requires some work, it doesn’t exactly fit the data model and if some debug is required then it may prove to be difficult debug.

Yet it exists, it works and will hopefully get easier to use.

And it prevents you from dealing with storage (amount of storage, resiliency, …) and cluster aggregation which are a lot trickier problems than what we faced here.

If however you still want to try out other tools, you can read those posts about Kubernetes monitoring.  or Kubernetes productivity tips.