Effortlessly Scale Kubernetes Workloads with Karpenter & KEDA! Cloud Scaling Series: EP03
This article is a follow-up edition in my series on Karpenter deployment. In this article, we will explore how KEDA helps us define scaling thresholds in an easy and efficient way.
First, let’s understand what KEDA is and how it enables cloud environment scaling. As always, I’ll share an architecture diagram to illustrate this.
Now, let’s consider a scenario where your application experiences a sudden spike in traffic. Based on your scaling configurations, the application will begin spinning up replicas. Simultaneously, to ensure these additional replicas remain up and running, Karpenter will dynamically provision new nodes. However, during this process, if the pods start crashing one after another due to their inability to handle the traffic, this leads to what is known as a “Death Spiral.”
In the architecture diagram below, I’ll demonstrate how this situation occurs during peak traffic periods. A recent real-world example of this would be the Coldplay concert ticket sales rush.
Let’s go through this architecture diagram. First, you can see multiple requests coming into CloudFront, which then forwards them to the ALB. The ALB routes these requests to the tickets deployment via the NGINX Ingress Controller (though not explicitly mentioned in the diagram, that’s how it works).
Now, assume that, at the beginning, the tickets deployment is running with a minimum of two pods. As incoming requests increase, the deployment starts scaling. However, as the pods become overloaded, they begin crashing and entering a CrashLoopBackOff state, triggering further scaling. This cycle continues — overloading, crashing, and rescheduling — leading to what is known as a “Death Spiral.”
Keep in mind that since Karpenter is also deployed in the cluster, it attempts to scale nodes dynamically to accommodate the growing number of pods. However, this situation doesn’t just affect the tickets deployment. In a microservices architecture, such failures can propagate across different services, leading to a “Cascading Pods Crash” — where dependent microservices fail as well.
Ultimately, this not only impacts your business operations but also increases your cloud costs. Since cloud providers follow a pay-as-you-go model, provisioning unnecessary nodes results in higher bills.
So, how can we mitigate this kind of situation?
A similar issue occurred when BookMyShow India started selling online tickets for Coldplay’s January 2025 Tour in Mumbai. However, they managed to mitigate and control the issue successfully for the Ahmedabad show — which, by the way, is how I managed to grab my Coldplay tickets 😉.
Identifying the Root Cause
Let’s break down what happened:
✅ Microservices architecture was used for deployment — Good approach.
✅ Karpenter was already deployed to scale pods and nodes — Good approach.
✅ During high demand, pods scaled according to HPA (Horizontal Pod Autoscaler) configurations — Good approach.
❌ Pods crashed due to high demand, even with proper HPA thresholds. But why?
Even if we set higher HPA values, this issue can still occur. The key problem is that user requests need to be queued and processed gradually rather than hitting the system all at once. Ideally, the scaling strategy should align with this queue system to ensure controlled, efficient scaling.
Solution: Re-architecting the Deployment with KEDA and Karpenter
Now, let’s redesign the deployment strategy by integrating KEDA with Karpenter to handle scaling dynamically and prevent these failures.
Now, as you can see, we have deployed KEDA along with a queue system to manage high-demand workloads. From this point onward, user requests no longer go directly to the tickets deployment but instead pass through the queue system first.
You might have seen similar queue systems during high-demand concert ticket sales. Sometimes — or rather, most of the time — users get frustrated when they receive a high queue position, like this:
Let’s break down our re-architected strategy using KEDA and Karpenter.
Once a user signs in to the ticket system, they can navigate to the specific show to purchase tickets. When they enter the ticket purchase section, they are automatically redirected to the queue system.
This queue system works alongside Amazon SQS (though you can also use Kafka or RabbitMQ, depending on your business requirements). It helps the queue deployment process user requests gradually, sending them to the tickets deployment in a controlled manner.
Where does KEDA fit in?
Once KEDA is deployed on the EKS cluster, it creates a scaling object. We can simply add our SQS queue URL to this object and configure KEDA’s scaling policies accordingly (as represented by the queue icon in the architecture diagram).
As demand increases, KEDA automatically scales pods, and Karpenter dynamically provisions nodes, ensuring a smooth and uninterrupted experience.
Additionally, I’d like to highlight something impressive I noticed — BookMyShow has integrated a real-time ticket availability update in their queue system’s frontend. This feature allows users to see which ticket categories still have seats available. Hats off to the We Are BookMyShow team for that awesome feature! 👏
Alright, in this article, I won’t be covering Karpenter deployment again, as no additional configuration is needed to connect it with KEDA. If you’re coming across this article first, I recommend checking out my Karpenter deployment article for a deeper understanding.
Deploying KEDA on Your EKS Cluster
Step 1: Add the KEDA Helm Repository
Run the following command to add the KEDA Helm repo:
helm repo add kedacore https://kedacore.github.io/charts
Step 2: Install KEDA Using Helm
helm upgrade --install keda kedacore/keda \
--version "${KEDA_CHART_VERSION}" \
--namespace keda \
--create-namespace \
--set "podIdentity.aws.irsa.enabled=true" \
--set "podIdentity.aws.irsa.roleArn=${KEDA_ROLE_ARN}" \
--wait
Once successfully deployed, you’ll see a response similar to this:
Release "keda" does not exist. Installing it now.
NAME: keda
LAST DEPLOYED: […]
NAMESPACE: kube-system
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
[…]
After the Helm installation, KEDA will run as several deployments in the keda
namespace. You can check the running deployments using:
kubectl get deployment -n keda
Expected output:
NAME READY UP-TO-DATE AVAILABLE AGE
keda-admission-webhooks 1/1 1 1 105s
keda-operator 1/1 1 1 105s
keda-operator-metrics-apiserver 1/1 1 1 105s
KEDA Deployment Components and Their Roles
- Agent (
keda-operator
) – Controls the scaling of workloads. - Metrics Server (
keda-operator-metrics-apiserver
) – Acts as a Kubernetes metrics server, providing access to external metrics. - Admission Webhooks (
keda-admission-webhooks
) – Validates resource configurations to prevent misconfigurations (e.g., multipleScaledObjects
targeting the same workload).
Configuring KEDA to Scale Workloads
Once installed, KEDA creates several custom resources. One of the most important is ScaledObject
, which allows you to map an external event source (such as AWS SQS) to a Deployment or StatefulSet for scaling.
Now, let’s see how to add an SQS URL to the Scaling Object.
Once the queueLength
threshold in the ScaledObject
is exceeded, KEDA will automatically scale up additional pods to process Amazon SQS messages more quickly and bring the queue length back within the desired limits.
As the workload increases, Karpenter will dynamically provision new nodes, ensuring that the environment remains stable and responsive.
Let’s continue the discussion in this scaling series! 🚀