Kubernetes Horizontal Pod Autoscaler using external metrics
Friday, April 23rd 2021

Scaling out in a k8s cluster is the job of the Horizontal Pod Autoscaler, or HPA for short. The HPA allows users to scale their application based on a plethora of metrics such as CPU or memory utilization. This is all well and good but what happens if you want to scale out your application based on an application-specific business metric?

This is where using an HPA with the External metrics API comes in. It allows users to scale their applications on what they already know to be the key metric of their applications. The metric that will make-or-break their application when it is under load.

This metric might not be CPU or memory. Luckily K8S allows users to "import" these metrics into the External Metric API and use them with an HPA.

In this example we will create a HPA that will scale our application based on Kafka topic lag.

It is based on the following software:

  • Kafka: The broker of our choice.
  • Prometheus: For gathering metrics.
  • Kafka-lag-exporter: For computing kafka topic lag and importing metrics into Prometheus.
  • Pometheus-adapter: For querying prometheus for stats and providing them to the external metrics API.

Installation and configuration will be performed by helm v3.

First of all we are going to install Prometheus in the monitoring namespace:

$ helm repo add prometheus-community https://prometheus-community.github.io/helm-charts  
$ helm repo update  
$ helm -n monitoring install prometheus prometheus-community/prometheus  

Prometheus is now deployed in the monitoring namespace:

$ helm -n monitoring list  
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION  
prometheus monitoring 1 2021-04-21 12:57:56.7438022 +0300 EEST deployed prometheus-13.8.0 2.26.0  

Next we are going to install Kafka in the kafka namespace using the Bitnami chart:

$ helm repo add bitnami https://charts.bitnami.com/bitnami  
$ helm repo update  
$ helm -n kafka install kafka bitnami/kafka --set metrics.kafka.enabled=true

We are going to enable metrics.kafka.enabled so as to create a standalone kafka exporter as per the docs: https://artifacthub.io/packages/helm/bitnami/kafka
Kafka is now deployed in the kafka namespace:

$ helm -n kafka list  
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION  
kafka kafka 1 2021-04-21 12:59:14.2364526 +0300 EEST deployed kafka-12.17.4 2.8.0  

Now that both Kafka and Prometheus are installed we need to find a way to get topic (+lag) information into Prometheus.
We do this by using kafka-lag-exporter:

$ helm repo add kafka-lag-exporter https://lightbend.github.io/kafka-lag-exporter/repo/  
$ helm -n monitoring install kafka-lag-exporter kafka-lag-exporter/kafka-lag-exporter --set clusters[0].name=sm0ke-cluster --set clusters[0].bootstrapBrokers=kafka.kafka:9092  

Kafka-lag-exporter needs to be told where to find out kafka cluster. This is achieved by specifying the clusters[0].name and clusters[0].bootstrapBrokersproperties. Feel free to adjust these settings to your liking.

The last piece of the puzzle is prometheus-adapter. It will query Prometheus using a predefined query and register these metrics with the external API server.

$ helm -n monitoring install prometheus-adapter prometheus-community/prometheus-adapter -f prometheus_adapter_values.yaml  

We use the following prometheus_adapter_values.yaml configuration:

logLevel: 4  
prometheus:  
 url: http://prometheus-server.monitoring.svc.cluster.local  
 port: 80  
rules:  
 external:  
 - seriesQuery: '{__name__=~"^kafka_consumergroup_group_lag"}'  
 resources:  
 template: > name:  
 matches: ""  
 as: "kafka_lag_metric"  
 metricsQuery: 'avg by (topic) (round(avg_over_time(>[1m])))'  

This configuration instructs prometheus-adapter to perform the query defined in seriesQuery on Prometheus and expose the metrics as computed by metricsQuery as a new metric in the External API named kafka_lag_metric.

We can now query the external metrics API directly to determine if our configuration indeed works.

kubectl get --raw /apis/external.metrics.k8s.io/v1beta1/namespaces/*/kafka_lag_metric |jq  
{  
 "kind": "ExternalMetricValueList",  
 "apiVersion": "external.metrics.k8s.io/v1beta1",  
 "metadata": {},  
 "items": [  
 {  
 "metricName": "kafka_lag_metric",  
 "metricLabels": {  
 "topic": "prices"  
 },  
 "timestamp": "2021-04-22T16:13:36Z",  
 "value": "400m"  
 }  
 ]  
}  

For the sake of this tutorial I have created a sample application using Quarkus that uses kafka queues. Here's the deployment configuration. It includes the deployment as well as a service so that we can view the application.

apiVersion: apps/v1  
kind: Deployment  
metadata:  
 name: kafkademo  
 namespace: default  
 labels:  
 app: kafkademo  
spec:  
 selector:  
 matchLabels:  
 app: kafkademo  
 replicas: 1  
 strategy:  
 rollingUpdate:  
 maxSurge: 25%  
 maxUnavailable: 25%  
 type: RollingUpdate  
 template:  
 metadata:  
 labels:  
 app: kafkademo  
 spec:  
 # initContainers:  
 # Init containers are exactly like regular containers, except:  
 # - Init containers always run to completion.  
 # - Each init container must complete successfully before the next one starts.  
 containers:  
 - name: kafkademo  
 image: k3d-sm0ke-cluster-registry:5000/kafka-demo:latest  
 resources:  
 requests:  
 cpu: 100m  
 memory: 100Mi  
 limits:  
 cpu: 100m  
 memory: 100Mi  
 ports:  
 - containerPort: 8080  
 name: kafkademo  
 restartPolicy: Always  
---  
apiVersion: v1  
kind: Service  
metadata:  
 name: kafkademo  
 namespace: default  
spec:  
 selector:  
 app: kafkademo  
 type: ClusterIP  
 ports:  
 - name: kafkademo  
 port: 15555  
 targetPort: 8080  
 protocol: TCP  
 nodePort:  
Last but not least, comes the HPA

apiVersion: autoscaling/v2beta1  
kind: HorizontalPodAutoscaler  
metadata:  
 name: kafkademo-hpa  
spec:  
 scaleTargetRef:  
 apiVersion: apps/v1  
 kind: Deployment  
 name: kafkademo  
 minReplicas: 3  
 maxReplicas: 12  
 metrics:  
 - type: External  
 external:  
 metricName: kafka_lag_metric_sm0ke  
 targetValue: 4
$ kubectl apply -f hpa.yamlThis HPA targets the kafkademo deployment and defines a minimum of 3 replicas and a maximum of 12 with a target value of 4.

Now when we try to get some more information about our HPA we will see the following:

$ kubectl describe hpa kafkademo-hpa  
Name: kafkademo-hpa  
Namespace: default  
Labels: <none>  
Annotations: <none>  
CreationTimestamp: Wed, 21 Apr 2021 20:33:24 +0300  
Reference: Deployment/kafkademo  
Metrics: ( current / target )  
 "kafka_lag_metric" (target value): 2 / 4  
Min replicas: 3  
Max replicas: 12  
Deployment pods: 12 current / 12 desired  
Conditions:  
 Type Status Reason Message  
 ---- ------ ------ -------  
 AbleToScale True ScaleDownStabilized recent recommendations were higher than current one, applying the highest recent recommendation  
 ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from external metric kafka_lag_metric(nil)  
 ScalingLimited True TooManyReplicas the desired replica count is more than the maximum replica count  
Events: <none>

Here you can see that the HPA is currently in the process of scaling down the kafkademo deployment since the current metric is below the defined target.