Posted on 31 August 2023, updated on 11 December 2023.
In this little blog post, we’re going to discover how to automatically scale your Kubernetes applications in an event driven way using KEDA.
Why do we want to automatically scale our applications?
As an SRE you are responsible for the optimal functioning of applications, their resilience, and their availability. Autoscaling is a concept that can answer this. You want to be sure your workloads will handle perfectly the traffic.
What is KEDA?
KEDA, or Kubernetes-based Event Driven Autoscaler, is a Kubernetes controller that will autoscale your applications based on the number of events needing to be processed.
KEDA is based on the concept of Scalers which are types of triggers or event sources from which we want to scale up our applications.
From your side, the only thing to do is to configure a ScaledObject
(KEDA CRD) by choosing the scaler you want to use to automatically scale your application, as well as a few parameters, and KEDA will do the rest for you:
- Monitor events sources
- Create and manage
HPA
lifecycle
As of today, there are 62 built-in scalers and 4 external scalers available.
What makes KEDA nice is this lightweight component and the fact that it uses native Kubernetes components such as HorizontalPodAutoscaler
. From my point of view, its "Plug and Play" approach is just wonderful 🤩.
Deploy KEDA
Well, the easiest way to deploy KEDA is to use their official Helm Chart as follows:
helm repo add kedacore
helm repo update
helm install keda kedacore/keda --namespace keda --create-namespace
⚠️ In case you're deploying KEDA with ArgoCD, you may encounter issues regarding the length of CRDs' annotations. You can use the template.sped.syncPolicy.syncOptions
with the option ServerSideApply=true
specifically for KEDA as a workaround. Also, you can disable CRDs deployment with helm chart's values file, but you'll have to find another way to deploy KEDA CRDs.
Automatically scale our sample web app based on the native cron scaler
It's time for us to play a little bit with KEDA!
Deploy our sample web app
For the demo, I'll use a small Golang web application that I made for this blog post. I deployed it using the following manifest:
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: go-helloworld
name: go-helloworld
spec:
selector:
matchLabels:
app: go-helloworld
template:
metadata:
labels:
app: go-helloworld
spec:
containers:
- image: rg.fr-par.scw.cloud/novigrad/go-helloworld:0.1.0
name: go-helloworld
resources:
requests:
cpu: "50m"
memory: "64Mi"
limits:
memory: "128Mi"
cpu: "100m"
---
apiVersion: v1
kind: Service
metadata:
name: go-helloworld
spec:
selector:
app: go-helloworld
ports:
- protocol: TCP
port: 8080
name: http
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
annotations:
kubernetes.io/ingress.class: nginx
cert-manager.io/cluster-issuer: letsencrypt
name: go-helloworld
spec:
rules:
- host: helloworld.jourdain.io
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: go-helloworld
port:
number: 8080
tls: # < placing a host in the TLS config will indicate a certificate should be created
- hosts:
- helloworld.jourdain.io
secretName: go-helloworld-tls-cert
Configure KEDA to automatically scale our web app on working hours only
Ok, let's imagine that we want our app to be available during working hours only. You might wonder why you do this. There could be several reasons for this.
For instance, in a development environment, there is not necessarily a need to keep applications up and running around the clock. In a cloud environment, it could save you a lot of money depending on the number of apps / compute instances you have.
Well, let's do this! 🤑
To achieve this, we're going to use the KEDA's native Cron scaler. Since the Cron scaler supports Linux format cron, it allows us to even scale our application during the working day too 😄
To configure the Cron scaler, we'll use the [ScaledObject]
CRD is as follows:
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: go-helloworld
spec:
scaleTargetRef:
name: go-helloworld
triggers:
- type: cron
metadata:
timezone: Europe/Paris
start: 00 08 * * 1-5
end: 00 18 * * 1-5
desiredReplicas: "2"
⚠️ The ScaledObject
must be in the same namespace as your application!
Let's dive a little bit into this configuration:
spec.scaleTargetRef
is a reference of your Kubernetes Deployment / StatefulSet or other custom resource you want to scalename
(mandatory): name of your Kubernetes resourcekind
(optional): kind of your Kubernetes resource, the default value isDeployment
spec.triggers
is a list of triggers to activate the scaling of the target resourcetype
(mandatory): scaler namemetadata
(mandatory): configuration parameters that the Cron scaler requires With this configuration, my application will be up and running with two replicas from 08:00 until 18:00 every day of the week from Monday through Friday. Isn't that fantastic? 😀
Automatically scale our sample web app based on HTTP events with KEDA HTTP add-on (external scaler)
As you have seen, with all scalers available, we can automatically scale our web application in many ways, like on several messages in an AMQP queue, for instance.
Now you understand how KEDA works. We're going to explore how KEDA can help us handle traffic spikes by automatically scaling our application based on HTTP events. To do so, we have two choices:
- Use the Prometheus scaler
- Use the KEDA HTTP external scaler, which is working like an add-on Since I don't have Prometheus installed on my demo cluster, I'm about to use the KEDA HTTP external scaler (perfect excuse to introduce you the an external scaler 🙄).
💡 The KEDA HTTP ADD-ON is currently in beta phase. It is mainly maintained by the KEDA team.
Overview of the solution
The KEDA HTTP scaler is an add-on built on top of the KEDA core, which comes with its components: operator, scaler, and interceptor. If you want to know more about their roles, feel free to read the official documentation. Anyway, to help you better understand how it will work, I made you a little diagram:
Install KEDA HTTP add-on
As this scaler is not a built-in one, we'll have to install it. As specified in the official documentation, we can install it with a Helm Chart:
helm install http-add-on kedacore/keda-add-ons-http --namespace keda
If everything goes well, you should see the following new pods:
❯ k get pods -l app=keda-add-ons-http -o name
pod/keda-add-ons-http-controller-manager-5c8d895cff-7jsl8
pod/keda-add-ons-http-external-scaler-57889786cf-r45lj
pod/keda-add-ons-http-interceptor-5bf6756df9-wwff8
pod/keda-add-ons-http-interceptor-5bf6756df9-x8l58
pod/keda-add-ons-http-interceptor-5bf6756df9-zxvw
Configure a HTTPScaledObject
for our web app
As I said before, the KEDA HTTP add-on comes with its components, including an operator, which also means that it comes with its CRD
s. The HTTPScaledObject
is a CRD
managed by the KEDA HTTP add-on. This is what we'll need to configure here. Let's create the HTTPScaledObject
resource for our web app:
⚠️ The HTTPScaleObject
resource must be created in the same namespace as your web app!
kind: HTTPScaledObject
apiVersion: http.keda.sh/v1alpha1
metadata:
name: go-helloworld
spec:
host: "helloworld.jourdain.io"
targetPendingRequests: 10
scaledownPeriod: 300
scaleTargetRef:
deployment: go-helloworld
service: go-helloworld
port: 8080
replicas:
min: 0
max: 10
Here, we have configured our HTTPScaledObject
to scale our app’s Deployment
from 0 to 10 replicas, knowing that if there are 10 requests in a pending state on the interceptor (requests that are not yet taken by your application), then KEDA will add a pod.
Adapt our web app's service and ingress resources
If you take a look at the diagram above, you can see that our web app Ingress will need to reference the KEDA HTTP add-on's interceptor service instead of the web app one. Since the Ingress can't reference a service in another namespace, we are going to create a service of type external
in the same namespace as our web app that references the interceptor service from the keda namespace:
kind: Service
apiVersion: v1
metadata:
name: keda-add-ons-http-interceptor-proxy
spec:
type: ExternalName
externalName: keda-add-ons-http-interceptor-proxy.keda.svc.cluster.local
Now, we need to re-configure the web app's ingress so that it refers to the newly created service:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
annotations:
kubernetes.io/ingress.class: nginx
cert-manager.io/cluster-issuer: letsencrypt
name: go-helloworld
spec:
rules:
- host: helloworld.jourdain.io
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: keda-add-ons-http-interceptor-proxy
port:
number: 8080
tls: # < placing a host in the TLS config will indicate a certificate should be created
- hosts:
- helloworld.jourdain.io
secretName: go-helloworld-tls-cert
⚠️ You need to put the name of the new service but pay attention to the port which is also replaced by the interceptor's service one
Let's try it!
To ensure that our configuration is working well, I'm going to use [k6]
, which is a load-testing tool. If you want to know more about k6, here are a few resources from the Padok blog:
Enough advertising; let's move on! 😁 Here is my k6 script with which I will do the test (with one or two changes):
import { check } from 'k6';
import http from 'k6/http';
export const options = {
scenarios: {
constant_request_rate: {
executor: 'constant-arrival-rate',
rate: 100,
timeUnit: '1s', // 100 iterations per second, i.e. 100 RPS
duration: '30s',
preAllocatedVUs: 50, // how large the initial pool of VUs would be
maxVUs: 50, // if the preAllocatedVUs are not enough, we can initialize more
},
},
};
export function test(params) {
const res = http.get('');
check(res, {
'is status 200': (r) => r.status === 200,
});
}
export default function () {
test();
}
First, let's see what happens with 100 constant RPS:
❯ k6 run k6/script.js
/\\ |‾‾| /‾‾/ /‾‾/
/\\ / \\ | |/ / / /
/ \\/ \\ | ( / ‾‾\\
/ \\ | |\\ \\ | (‾) |
/ __________ \\ |__| \\__\\ \\_____/ .io
execution: local
script: k6/script.js
output: -
scenarios: (100.00%) 1 scenario, 50 max VUs, 1m0s max duration (incl. graceful stop):
* constant_request_rate: 100.00 iterations/s for 30s (maxVUs: 50, gracefulStop: 30s)
✓ is status 200
checks.........................: 100.00% ✓ 3001 ✗ 0
data_received..................: 845 kB 28 kB/s
data_sent......................: 134 kB 4.5 kB/s
http_req_blocked...............: avg=792.54µs min=0s med=1µs max=137.85ms p(90)=2µs p(95)=2µs
http_req_connecting............: avg=136.6µs min=0s med=0s max=17.67ms p(90)=0s p(95)=0s
http_req_duration..............: avg=11.38ms min=7.68ms med=10.68ms max=100.96ms p(90)=12.78ms p(95)=14.33ms
{ expected_response:true }...: avg=11.38ms min=7.68ms med=10.68ms max=100.96ms p(90)=12.78ms p(95)=14.33ms
http_req_failed................: 0.00% ✓ 0 ✗ 3001
http_req_receiving.............: avg=89.68µs min=8µs med=64µs max=6.35ms p(90)=112µs p(95)=134µs
http_req_sending...............: avg=152.31µs min=14µs med=137µs max=2.57ms p(90)=274µs p(95)=313µs
http_req_tls_handshaking.......: avg=587.62µs min=0s med=0s max=74.46ms p(90)=0s p(95)=0s
http_req_waiting...............: avg=11.14ms min=7.62ms med=10.48ms max=100.92ms p(90)=12.47ms p(95)=13.96ms
http_reqs......................: 3001 99.983105/s
iteration_duration.............: avg=12.37ms min=7.73ms med=10.88ms max=194.89ms p(90)=13.07ms p(95)=14.99ms
iterations.....................: 3001 99.983105/s
vus............................: 1 min=1 max=1
vus_max........................: 50 min=50 max=50
running (0m30.0s), 00/50 VUs, 3001 complete and 0 interrupted iterations
constant_request_rate ✓ [======================================] 00/50 VUs 30s 100.00 iters/s
💡 If you want to see live how many requests the interceptor has in its queue, you can launch in two terminals panes/tabs the following commands:
❯ kubectl proxy
Starting to serve on 127.0.0.1:8001
and:
❯ watch -n '1' curl --silent localhost:8001/api/v1/namespaces/keda/services/keda-add-ons-http-interceptor-admin:9090/proxy/queue
{"default/go-helloworld":0}
With the 100 RPS test, my application did not scale up because the number of pending requests in the interceptor queue did not exceed 1. As a reminder, we configured targetPendingRequests
to 10
. So everything seems normal 😁
Let's x10 our RPS and see what will happen:
❯ k6 run k6/script.js
/\\ |‾‾| /‾‾/ /‾‾/
/\\ / \\ | |/ / / /
/ \\/ \\ | ( / ‾‾\\
/ \\ | |\\ \\ | (‾) |
/ __________ \\ |__| \\__\\ \\_____/ .io
execution: local
script: k6/script.js
output: -
scenarios: (100.00%) 1 scenario, 50 max VUs, 1m0s max duration (incl. graceful stop):
* constant_request_rate: 1000.00 iterations/s for 30s (maxVUs: 50, gracefulStop: 30s)
✗ is status 200
↳ 99% — ✓ 11642 / ✗ 2
checks.........................: 99.98% ✓ 11642 ✗ 2
data_received..................: 2.6 MB 86 kB/s
data_sent......................: 446 kB 15 kB/s
dropped_iterations.............: 18356 611.028519/s
http_req_blocked...............: avg=1.07ms min=0s med=0s max=408.06ms p(90)=1µs p(95)=1µs
http_req_connecting............: avg=43.12µs min=0s med=0s max=11.05ms p(90)=0s p(95)=0s
http_req_duration..............: avg=120.09ms min=8.14ms med=74.77ms max=6.87s p(90)=189.49ms p(95)=250.21ms
{ expected_response:true }...: avg=120.01ms min=8.14ms med=74.76ms max=6.87s p(90)=189.41ms p(95)=249.97ms
http_req_failed................: 0.01% ✓ 2 ✗ 11642
http_req_receiving.............: avg=377.61µs min=5µs med=32µs max=27.32ms p(90)=758.1µs p(95)=2.49ms
http_req_sending...............: avg=61.57µs min=9µs med=45µs max=9.99ms p(90)=102µs p(95)=141µs
http_req_tls_handshaking.......: avg=626.79µs min=0s med=0s max=297.82ms p(90)=0s p(95)=0s
http_req_waiting...............: avg=119.65ms min=7.95ms med=74.32ms max=6.87s p(90)=188.95ms p(95)=249.76ms
http_reqs......................: 11644 387.60166/s
iteration_duration.............: avg=121.26ms min=8.32ms med=74.87ms max=7.07s p(90)=189.62ms p(95)=250.28ms
iterations.....................: 11644 387.60166/s
vus............................: 44 min=25 max=50
vus_max........................: 50 min=50 max=50
running (0m30.0s), 00/50 VUs, 11644 complete and 0 interrupted iterations
constant_request_rate ✓ [======================================] 00/50 VUs 30s 1000.00 iters/s
Not that bad 🧐 I think that the fact that we have two requests KO, is because of the application cold start (because it started from 0) and k6, which does not wait more than 1/2 seconds per request.
Here is the deployment history:
❯ k get deployments.apps -w
NAME READY UP-TO-DATE AVAILABLE AGE
go-helloworld 0/0 0 0 36m
go-helloworld 0/1 0 0 36m
go-helloworld 1/1 1 1 36m
go-helloworld 1/4 1 1 36m
go-helloworld 2/4 4 2 36m
go-helloworld 3/4 4 3 36m
go-helloworld 4/4 4 4 36m
go-helloworld 4/5 4 4 37m
go-helloworld 5/5 5 5 37m
As you can see, the application scaled up from 0 to 5 replicas; until the number of pending requests for the web application is less than 10.
From what I have observed, the scaling instructions were very fast, the app quickly reached the 5 replicas.
Here is a little comparison of the http_req_duration
k6 metric between the 100 RPS and 1k RPS tests:
# 100 RPS
http_req_duration: avg=11.38ms min=7.68ms med=10.68ms max=100.96ms p(90)=12.78ms p(95)=14.33ms
# 1k RPS
http_req_duration: avg=120.09ms min=8.14ms med=74.77ms max=6.87s p(90)=189.49ms p(95)=250.21ms
Depending on our needs (SLOs, SLAs, etc.), we can maybe adjust a little bit the targetPendingRequests
parameter of our web application HTTPScaledObject
.
Scale to zero!
With the two examples we've covered in this article, you may already experience the scale to zero. However, do you know how it works?
As KEDA automatically scales applications based on events, from the moment an event is received, KEDA will scale the application to its minimum replica. For instance, if we take the case of HTTP add-on, KEDA will scale to the min replica at the first received request.