Monitoring microservices

Overview

Having relevant information about the state of the infrastructure and other systems in a microservices architecture is crucial for ensuring stability, thus keeping the end-user satisfied. Metrics concerning health and performance of the deployments let the team react faster while having a negligible effect on the system.

Systems applying the microservice architecture pattern are harder to monitor, as Kubernetes pods are mortal, therefore a fixed scraping config is off the mark. That's where the service discovery comes in.

This tutorial walks you through building a microservice monitoring solution from scratch with dynamic service discovery inside a Kubernetes cluster using Prometheus and Grafana.

Most monitoring platforms utilize time-series databases, which are intended for storing time-based information, such as application metrics. Some products only manage the database part (e.g. InfluxDB), while Prometheus offers visualization alongside the metrics management. However, it's common to extend its visualization capabilities with Grafana.

So without further ado, let's get started!

Prerequisite

The tutorial assumes that you have a Kubernetes cluster set up and ready-to-go.
Also, it's handy to have a Spring Boot microservice laying around to monitor.

Publishing metrics from Spring Boot applications

Spring Boot's Actuator provides a production-ready solution for shipping sensible default application metrics without any hassle.

Dependencies

Our first step should be adding the required dependencies.

<dependency>  
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-actuator</artifactId>
</dependency>  
<dependency>  
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-registry-prometheus</artifactId>
</dependency>  
Configuring Actuator

Actuator exposes some additional information out of the box. To disable these endpoints and include those which are needed in our monitoring setup, add this to your application.yml:

management:  
  endpoints:
    enabled-by-default: false
    web:
      base-path: /status
      exposure:
        include: prometheus, health

In a production environment, it is advised to disable these endpoints, as they might expose details about the system's underlying implementation via various HTTP endpoints. Remapping the base path is optional.

(Optional) Publish custom application name

Some Grafana dashboards provide grouping capabilities if you publish the application name alongside the default metrics.

import io.micrometer.core.instrument.MeterRegistry;  
import org.springframework.beans.factory.annotation.Value;  
import org.springframework.boot.actuate.autoconfigure.metrics.MeterRegistryCustomizer;  
import org.springframework.context.annotation.Bean;  
import org.springframework.context.annotation.Configuration;

@Configuration
public class MeterRegistryCustomizerConfiguration {

    @Value("${spring.application.name}")
    private String applicationName;

    @Bean
    MeterRegistryCustomizer<MeterRegistry> metricsCommonTags(){
        return registry -> registry.config().commonTags("application", applicationName);
    }

}

Your application now exposes two more additional endpoints: /status/health and /status/prometheus to let you monitor your application.

Collecting the information

Prometheus is an open-source system for monitoring and alerting. We'll use it to collect application metrics.

Prometheus configuration

Prometheus is designed to be configured via command-line flags and a configuration file. Providing this configuration file as a ConfigMap is a simple but reliable solution for setting up Prometheus according to our needs. The previously mentioned service discovery's essence comes from relabeling the existing pod scrape endpoints to the ones exposed by Spring Boot Actuator. This is done by simple regex expressions.

kind: ConfigMap  
apiVersion: v1  
metadata:  
  name: prometheus-config
  namespace: my-namespace
  labels:
    k8s-app: metrics-service
data:  
  prometheus.yml: |-
    scrape_configs:     
      - job_name: 'kubernetes-pods'

        kubernetes_sd_configs:
        - role: pod

        relabel_configs:
        - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
          action: keep
          regex: true
        - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
          action: replace
          target_label: __metrics_path__
          regex: (.+)
        - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
          action: replace
          regex: ([^:]+)(?::\d+)?;(\d+)
          replacement: $1:$2
          target_label: __address__
        - action: labelmap
          regex: __meta_kubernetes_pod_label_(.+)
        - source_labels: [__meta_kubernetes_namespace]
          action: replace
          target_label: kubernetes_namespace
        - source_labels: [__meta_kubernetes_pod_name]
          action: replace
          target_label: kubernetes_pod_name

Note the following annotations: prometheus.io/scrape, prometheus.io/path, prometheus.io/port. We'll wire them in later.

Setting up a service account for Prometheus

The service discovery relies on metrics provided by Kubernetes. To access these, we'll need a service account.

apiVersion: rbac.authorization.k8s.io/v1beta1  
kind: ClusterRole  
metadata:  
  name: prometheus-kube
rules:  
  - apiGroups: [""]
    resources:
      - nodes
      - nodes/proxy
      - services
      - endpoints
      - pods
    verbs: ["get", "list", "watch"]
  - apiGroups:
      - extensions
    resources:
      - ingresses
    verbs: ["get", "list", "watch"]
  - nonResourceURLs: ["/metrics"]
    verbs: ["get"]

---

apiVersion: rbac.authorization.k8s.io/v1beta1  
kind: ClusterRoleBinding  
metadata:  
  name: prometheus-kube
roleRef:  
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: prometheus-kube
subjects:  
  - kind: ServiceAccount
    name: default
    namespace: my-namespace
An example Prometheus deployment

Prometheus should be accessible inside the cluster in order to use it as Grafana's datasource. There is no need to expose it outside the cluster though, therefore a ClusterIP is enough.

kind: Service  
apiVersion: v1  
metadata:  
  name: prometheus-service
  namespace: my-namespace
  labels:
    k8s-app: prometheus-service
spec:  
  ports:
  - protocol: TCP
    port: 9090
    targetPort: 9090
  selector:
    k8s-app: prometheus-service
  type: ClusterIP

---

kind: Deployment  
apiVersion: extensions/v1beta1  
metadata:  
  name: prometheus-service
  namespace: my-namespace
  labels:
    k8s-app: prometheus-service
spec:  
  replicas: 1
  selector:
    matchLabels:
      k8s-app: prometheus-service
  template:
    metadata:
      name: prometheus-service
      labels:
        k8s-app: prometheus-service
    spec:
      volumes:
      - name: prometheus-config-volume
        configMap:
          name: prometheus-config
      containers:
      - name: prometheus
        image: prom/prometheus
        imagePullPolicy: Always
        ports:
        - containerPort: 9090
        volumeMounts:
        - name: prometheus-config-volume
          mountPath: /etc/prometheus
        args: ["--config.file=/etc/prometheus/prometheus.yml"]
      restartPolicy: Always
      dnsPolicy: ClusterFirst

It is advised to set a data retention policy, with the --storage.tsdb.retention=32d container flag.

Updating your deployments

The expressions showed previously utilize the annotations defined in your Kubernetes deployments. The regex based relabeling allows the actual pod scrape endpoint to be configured via the following annotations:
1. prometheus.io/scrape: Set the value to true to enable scraping of pods belonging to the deployment
2. prometheus.io/path: If the metrics path is not /metrics override this. In our example, it's /status/prometheus.
3. prometheus.io/port: Scrape the pod on the indicated port instead of the default of 9102.

spec:  
  template:
    metadata:
      annotations:
        prometheus.io/path: '/status/prometheus'
        prometheus.io/port: '8080'
        prometheus.io/scrape: 'true'

Adding Grafana to the cluster

To be able to access your Grafana dashboard outside the cluster, you should assign a NodePort to it.

kind: Service  
apiVersion: v1  
metadata:  
  name: grafana-service
  namespace: my-namespace
  labels:
    k8s-app: grafana-service
spec:  
  ports:
  - protocol: TCP
    port: 3000
    targetPort: 3000
    nodePort: 3000
  selector:
    k8s-app: grafana-service
  type: NodePort

---

kind: Deployment  
apiVersion: extensions/v1beta1  
metadata:  
  name: grafana-service
  namespace: my-namespace
  labels:
    k8s-app: grafana-service
spec:  
  replicas: 1
  selector:
    matchLabels:
      k8s-app: grafana-service
  template:
    metadata:
      name: grafana-service
      labels:
        k8s-app: grafana-service
    spec:
      volumes:      
      containers:
      - name: grafana
        image: grafana/grafana
        imagePullPolicy: Always
        ports:
        - containerPort: 3000
        env:
        - name: GF_SECURITY_ADMIN_PASSWORD
          value: D4vidH4ss3lh0ff1sC00L
        - name: GF_SECURITY_ADMIN_USER
          value: admin
        - name: GF_USERS_ALLOW_SIGN_UP
          value: "false"
      restartPolicy: Always
      dnsPolicy: ClusterFirst

Connecting Grafana to Prometheus

After you've applied the previous deployments, Grafana should be accessible at http://your-node.example.com:3000/login with the credentials we defined previously.

The next step is to configure Grafana to use Prometheus as a datasource. In the configuration wizard add the following datasource:

Adding Prometheus as a datasource in Grafana

Picking a Grafana dashboard from the store

Grafana has a store, where premade dashboards are available. We recommend this one for Spring Boot applications.

The result

Conclusion

In this tutorial, we explored how to scrape service discovery based application metrics from Spring Boot using Prometheus and Grafana inside a Kubernetes cluster. Congratulations! You have just set up a microservice monitoring infrastructure!