Suraj Kumar
Writing
© 2026 RSS

We Went All-In on Serverless. Here Is What It Actually Cost Us.

The deployment story is as good as advertised. The lock-in, the bills, and the DynamoDB migrations are not.

Every few years the industry lands on a new abstraction that promises to let engineers focus on writing software instead of managing infrastructure. Serverless was the most compelling version of that pitch I had encountered. And for the first year, it genuinely delivered on it.

This is the story of what happened after that.

The Honeymoon Is Real

I want to be honest about this part because it is easy to write a post like this and retroactively make everything sound like a mistake. The early experience was not a mistake. It was genuinely good.

AWS CDK gives you infrastructure as code that reads like actual software rather than YAML archaeology. You define a Lambda function in TypeScript, attach it to an API Gateway, point it at a DynamoDB table, and you have a deployed API in a few dozen lines of code. It goes into source control. It runs through a GitHub Actions pipeline on every merge to main. It deploys in minutes, reproducibly, from a clean state.

That feedback loop felt like real production engineering. VPCs, subnets, security groups, IAM roles with least privilege, encrypted storage at rest, all expressed as code, all version-controlled, all reviewable in a pull request. New joiners could get a working environment running in an afternoon. You could tear the whole stack down and recreate it from scratch in ten minutes. For a small team trying to move quickly, it felt like the answer.

DynamoDB was a genuine pleasure for simple read-heavy patterns. Lambda scaled automatically without any configuration. CloudWatch gave you alarms out of the box. The AWS documentation, whatever else you might say about it, was thorough. The ecosystem was mature. The tooling worked.

Enjoy this period. It does not last.

When AWS Starts Showing Its Shape

At some point you need to make a change that the service was not designed for. That is when things get interesting in the wrong way.

We needed to move a load balancer to a different VPC. On any normal piece of infrastructure, that is a configuration change. In AWS, it is a replacement. The load balancer has to be destroyed and recreated, which means a migration plan, a maintenance window, and a conversation with stakeholders about downtime for what should have been an afternoon of work. Nothing broke. Nothing failed. We simply ran into the edge of what the service permits, and the service won.

That is not an isolated example. It is the texture of working with AWS at any meaningful depth. The services are opinionated about how they want to be used, and when your requirements drift from those opinions, you pay for it. Not always in money. In time, in workarounds, in architectural decisions shaped by the platform rather than the problem.

Cold starts compound as the system grows. A Lambda with a two-second cold start is annoying on its own. Four of them invoked in series is eight seconds of user-facing latency. You add provisioned concurrency to keep instances warm and suddenly you are paying for compute around the clock whether or not requests are coming in. The “pay only for what you use” promise has been quietly renegotiated.

Then your DynamoDB access patterns shift. And they will shift, because requirements always change. What was an index optimisation problem on a relational database becomes a data model migration on DynamoDB.

Here is a real example. We needed a new query pattern. That meant a Global Secondary Index that did not exist on the current table. On a relational database you write CREATE INDEX and get on with your day. In DynamoDB, certain table properties are immutable after creation. The answer, in our case, was that the table had to be recreated.

So now we had a migration project. Provision the new table with the correct schema. Write a script to copy the existing data across. Set up a DynamoDB stream on the old table so that writes made during the migration flow through to the new one, because we could not take the service down. Test the copy, verify item counts, make sure the stream was keeping up. Update the application to point at the new table. Monitor for a while to be sure nothing was lost. Decommission the stream. Delete the old table.

That was the process for adding an index. Not an afternoon. A migration project, with real risk, coordination overhead, and the nagging possibility that the data in the new table was subtly inconsistent with what had been in the old one. We ran it on a Saturday morning because of course we did.

The Lock-In Is Not a Side Effect, It Is the Product

Individual AWS quirks are frustrating. The compound effect is what makes them a serious problem.

Every Lambda we wrote was code that assumed an AWS execution environment. Every CDK construct was infrastructure logic that only runs against CloudFormation. Every DynamoDB table held data shaped around DynamoDB’s access model. Every IAM policy, every VPC configuration, every CloudWatch alarm: none of it means anything outside of AWS. We were not just using a hosting provider. We were building on a proprietary platform, and every line of code deepened that dependency.

The insidious part is the timeline. In the first year the lock-in is invisible because everything works and migration is not on anyone’s mind. By the second year you can feel the constraints but they are not yet alarming. By the time you genuinely decide that AWS is not working for you, whether that is because the bill has grown faster than the product, because you have hit a fundamental architectural limitation, or because you need something the platform does not offer cleanly, you are not looking at a migration anymore. You are looking at a rewrite.

Thousands of lines of application code. Thousands of lines of infrastructure code. An operational model built entirely around AWS assumptions. A team that has spent two years developing expertise in tools that transfer nowhere. The moment you want to leave, you understand exactly how stuck you are.

This is not a design oversight. It is the business model.

What It Actually Costs

The pricing felt reasonable at the start. Lambda and DynamoDB are cheap at low scale. Then I ran the numbers on a comparable workload running on DigitalOcean and Hetzner and felt a bit sick.

Data transfer costs are the first shock. AWS charges for egress in a way that accumulates quietly and then appears on a bill you were not expecting. CloudWatch log ingestion and storage add up. CloudFront has its own pricing surface. Each new managed service brings another billing dimension, and understanding your total spend becomes its own part-time job as the architecture grows.

A deployment costing several hundred pounds a month on AWS frequently runs for a fraction of that on a managed Kubernetes cluster on hardware you actually control. The managed services you are paying a premium for on AWS can be replaced with open-source tooling that runs wherever you tell it to.

What We Should Have Built

The architecture that would have served us better is one you can describe without naming a vendor.

Docker containers, deployed to Kubernetes, with application logic in Spring. The business logic is portable because Spring is a framework and frameworks run anywhere. The containers run on DigitalOcean today. They can run on Hetzner, on bare metal, or on AWS itself if the economics ever swing that way. We are not betting the codebase on any single provider’s pricing stability or continued good behaviour.

The infrastructure tooling for this stack is mature and genuinely transferable. Ansible handles VM provisioning, bootstrapping the cluster, configuring networking, installing dependencies, with playbooks that are readable and repeatable without a proprietary control plane. Terraform manages cloud resources against a provider API that supports multiple clouds, so the infrastructure code is not permanently attached to one vendor. ArgoCD handles deployments as GitOps: the desired cluster state lives in source control, changes are pull requests, and the cluster converges to match. The audit trail is the git log.

For content delivery, Cloudflare does a better job than CloudFront and charges less for it. No AWS egress fees for content served at the edge. A simpler pricing model. WAF rules that do not require a doctorate in AWS policy syntax. Switching is an afternoon’s work and the difference in the bill is immediate.

On the Scaling Argument

The standard defence of serverless is that Lambda handles traffic spikes automatically. You do not have to think about capacity because the platform scales to meet demand.

This is true. It is also less of an advantage than it sounds.

Kubernetes scales too. Horizontal Pod Autoscaler adds pods when load crosses a threshold. Cluster Autoscaler adds nodes when pods cannot be scheduled. The scaling is not as instant as Lambda, but for the overwhelming majority of real workloads that does not matter. Your traffic spike is not arriving in the next hundred milliseconds. You have time.

What you get in return is control. You understand the infrastructure. You can predict what a given load level will cost. You can tune the behaviour. You are not subject to Lambda concurrency limits, to provisioned concurrency pricing, or to the operational characteristics of a runtime you cannot inspect or modify.

So Was Serverless a Mistake?

Not entirely. If you are two engineers building an early-stage product with uncertain traffic and no desire to manage infrastructure, the serverless experience in that phase is genuinely excellent and the trade-offs are appropriate. The lock-in is a problem for future-you, and future-you may have resources and options that present-you does not.

But if you are building something you intend to operate for years, with a team that will grow and a cost base that needs to stay predictable, the calculation looks different. The convenience of the early phase is real. So is the cost of the later phase, in AWS bills, in architectural constraints, and in the difficulty of leaving once you are committed.

The question is not whether serverless works. It works. The question is whether you have thought clearly about what you are trading for that convenience, and whether it will still feel like a good deal when the bill lands two years from now.

Most teams have not thought about it. Most teams find out the hard way.

We did.

Discussion