Elastic Beanstalk is evil!

For all you AWS’ers out there, I’m hoping this will provide a reason to avoid the EB (Elastic Beanstalk) stack. A quick summary of “what is elastic beanstalk” from Stack Overflow:

Elastic Beanstalk is one layer of abstraction away from the EC2 layer. Elastic Beanstalk will setup an “environment” for you that can contain a number of EC2 instances, an optional database, as well as a few other AWS components such as a Elastic Load Balancer, Auto-Scaling Group, Security Group. Then Elastic Beanstalk will manage these items for you whenever you want to update your software running in AWS. Elastic Beanstalk doesn’t add any cost on top of these resources that it creates for you. If you have 10 hours of EC2 usage, then all you pay is 10 compute hours.

So I’ve recently finished delivering two projects using Elastic Beanstalk, and initially I was a fan. It brings everything into one dashboard / centralised control area. Your Load Balancers (ELB), Autoscaling groups (ASG), EC2, Notifications, Monitoring, Metrics and general Orchestration.

It provides a way to get zero downtime deployments, using two stacks and performing CNAME swaps (Blue / Green deployments).

Rather than creating two seperate stacks for each environment, we decided to go with rolling updates as our “zero downtime deployment” (I am not going into the details of how we achieved this – email me if you are interested and I’ll be happy to post).

This worked really well for the first few months… until we used it in anger within production. Now, if elastic beanstalk fails to deploy, it retries 2 times (total of 3) and then starts to roll back.

It will then attempt to roll back. It tries to roll back by deploying the original version on new servers. If that fails, it will try 2 more times (total of 3). If that fails as well, the EB stack goes into a grey state. I call it zombie state.

Lets not go into why this could happen (because there are many reasons why, ie – bad health checks).

Regardless of your underlying resources, your EB stack is deadlocked. You cannot recover it AT ALL.

Even if your  underlying infrastructure is fine, the web server or API is working and serving users – EB will not accept any changes, new deployments, changes in configuration until you tear down the physical resources, and rebuild.

Essentially the abstraction layer on top of these services has deadlocked and it cannot be fixed.

As soon as you tear down your EB stack, there goes your IP, and welcome DNS caching issues.

My recommendation for anyone building serious production services – avoid the EB stack. Even though it seems great at the start, it will cause you pain as time wears on.