Kubernetes Horizontal Pod Autoscaler using external metrics
Friday, April 23rd 2021

Scaling out in a k8s cluster is the job of the Horizontal Pod Autoscaler, or HPA for short. The HPA allows users to scale their application based on a plethora of metrics such as CPU or memory utilization. This is all well and good but what happens if you want to scale out your application based on an application-specific business metric?

This is where using an HPA with the External metrics API comes in. It allows users to scale their applications on what they already know to be the key metric of their applications. The metric that will make-or-break their application when it is under load.

This metric might not be CPU or memory. Luckily K8S allows users to "import" these metrics into the External Metric API and use them with an HPA.

In this example we will create a HPA that will scale our application based on Kafka topic lag.

It is based on the following software:

  • Kafka: The broker of our choice.
  • Prometheus: For gathering metrics.
  • Kafka-lag-exporter: For computing kafka topic lag and importing metrics into Prometheus.
  • Pometheus-adapter: For querying prometheus for stats and providing them to the external metrics API.

Installation and configuration will be performed by helm v3.

First of all we are going to install Prometheus in the monitoring namespace:

$ helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
$ helm repo update
$ helm -n monitoring install prometheus prometheus-community/prometheus

Prometheus is now deployed in the monitoring namespace:

$ helm -n monitoring list
NAME                    NAMESPACE       REVISION        UPDATED                                 STATUS          CHART                           APP VERSION
prometheus              monitoring      1               2021-04-21 12:57:56.7438022 +0300 EEST  deployed        prometheus-13.8.0               2.26.0

Next we are going to install Kafka in the kafka namespace using the Bitnami chart:

$ helm repo add bitnami https://charts.bitnami.com/bitnami
$ helm repo update
$ helm -n kafka install kafka bitnami/kafka --set metrics.kafka.enabled=true

We are going to enable metrics.kafka.enabled so as to create a standalone kafka exporter as per the docs: https://artifacthub.io/packages/helm/bitnami/kafka
Kafka is now deployed in the kafka namespace:

$ helm -n kafka list
NAME    NAMESPACE       REVISION        UPDATED                                 STATUS          CHART           APP VERSION
kafka   kafka           1               2021-04-21 12:59:14.2364526 +0300 EEST  deployed        kafka-12.17.4   2.8.0

Now that both Kafka and Prometheus are installed we need to find a way to get topic (+lag) information into Prometheus.
We do this by using kafka-lag-exporter:

$ helm repo add kafka-lag-exporter https://lightbend.github.io/kafka-lag-exporter/repo/
$ helm -n monitoring install kafka-lag-exporter kafka-lag-exporter/kafka-lag-exporter --set clusters[0].name=sm0ke-cluster --set clusters[0].bootstrapBrokers=kafka.kafka:9092

Kafka-lag-exporter needs to be told where to find out kafka cluster. This is achieved by specifying the clusters[0].name and clusters[0].bootstrapBrokersproperties. Feel free to adjust these settings to your liking.

The last piece of the puzzle is prometheus-adapter. It will query Prometheus using a predefined query and register these metrics with the external API server.

$ helm -n monitoring install prometheus-adapter prometheus-community/prometheus-adapter -f prometheus_adapter_values.yaml

We use the following prometheus_adapter_values.yaml configuration:

logLevel: 4
prometheus:
  url: http://prometheus-server.monitoring.svc.cluster.local
  port: 80
rules:
  external:
  - seriesQuery: '{__name__=~"^kafka_consumergroup_group_lag"}'
    resources:
      template: <<.Resource>>    name:
      matches: ""
      as: "kafka_lag_metric"
    metricsQuery: 'avg by (topic) (round(avg_over_time(<<.Series>>[1m])))'

This configuration instructs prometheus-adapter to perform the query defined in seriesQuery on Prometheus and expose the metrics as computed by metricsQuery as a new metric in the External API named kafka_lag_metric.

We can now query the external metrics API directly to determine if our configuration indeed works.

kubectl get --raw /apis/external.metrics.k8s.io/v1beta1/namespaces/*/kafka_lag_metric |jq
{
  "kind": "ExternalMetricValueList",
  "apiVersion": "external.metrics.k8s.io/v1beta1",
  "metadata": {},
  "items": [
    {
      "metricName": "kafka_lag_metric",
      "metricLabels": {
          "topic": "prices"
      },
      "timestamp": "2021-04-22T16:13:36Z",
      "value": "400m"
    }
  ]
}

For the sake of this tutorial I have created a sample application using Quarkus that uses kafka queues. Here's the deployment configuration. It includes the deployment as well as a service so that we can view the application.

apiVersion: apps/v1
kind: Deployment
metadata:
  name:  kafkademo
  namespace: default
  labels:
    app:  kafkademo
spec:
  selector:
    matchLabels:
      app: kafkademo
  replicas: 1
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      labels:
        app:  kafkademo
    spec:
      # initContainers:
        # Init containers are exactly like regular containers, except:
          # - Init containers always run to completion.
          # - Each init container must complete successfully before the next one starts.
      containers:
      - name:  kafkademo
        image:  k3d-sm0ke-cluster-registry:5000/kafka-demo:latest
        resources:
          requests:
            cpu: 100m
            memory: 100Mi
          limits:
            cpu: 100m
            memory: 100Mi
        ports:
        - containerPort:  8080
          name:  kafkademo
      restartPolicy: Always
---
apiVersion: v1
kind: Service
metadata:
  name: kafkademo
  namespace: default
spec:
  selector:
    app: kafkademo
  type: ClusterIP
  ports:
  - name: kafkademo
    port: 15555
    targetPort: 8080
    protocol: TCP
    nodePort:

Last but not least, comes the HPA

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: kafkademo-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: kafkademo
  minReplicas: 3
  maxReplicas: 12
  metrics:
  - type: External
    external:
      metricName: kafka_lag_metric_sm0ke
      targetValue: 4
$ kubectl apply -f hpa.yaml

This HPA targets the kafkademo deployment and defines a minimum of 3 replicas and a maximum of 12 with a target value of 4.

Now when we try to get some more information about our HPA we will see the following:

$ kubectl describe hpa kafkademo-hpa
Name:                                       kafkademo-hpa
Namespace:                                  default
Labels:                                     <none>
Annotations:                                <none>
CreationTimestamp:                          Wed, 21 Apr 2021 20:33:24 +0300
Reference:                                  Deployment/kafkademo
Metrics:                                    ( current / target )
  "kafka_lag_metric" (target value):  2 / 4
Min replicas:                               3
Max replicas:                               12
Deployment pods:                            12 current / 12 desired
Conditions:
  Type            Status  Reason               Message
  ----            ------  ------               -------
  AbleToScale     True    ScaleDownStabilized  recent recommendations were higher than current one, applying the highest recent recommendation
  ScalingActive   True    ValidMetricFound     the HPA was able to successfully calculate a replica count from external metric kafka_lag_metric(nil)
  ScalingLimited  True    TooManyReplicas      the desired replica count is more than the maximum replica count
Events:           <none>

Here you can see that the HPA is currently in the process of scaling down the kafkademo deployment since the current metric is below the defined target.