AWS: reliable? It ain’t!

Let me caveat that.

There’s some idea that stuff is bulletproof once on AWS. It’s not.

AWS has internal issues as do any provider. E.g. network problems (remember that S3 EOF bug?), disk drives failing (Retirement Notifications anyone?), etc

On top of that, stuff will hit internal inconsistencies. E.g. you have an ASG which tries to launch an EC2 instance only to fail ‘cos you’ve reached your limit for that particular type of EC2 instance.

But you can build around it in your app with Error Retries and Exponential Backoffs (techniques probably more familiar to mobile developers):


E.g. here’s a solution Terragrunt uses:¬†

