arrow-down coffee engineering consultancy development remote-management support linkedin twitter youtube email phone

Spending 100K USD in 4,5 days on Amazon Web Services

Submitted by walterheck on April 25, 2017

This post is probably one of the more embarrassing post-mortems I have ever written, but I feel it's important so that we can warn others and hopefully prevent a lot of pain in the process.

A word of warning: there are a lot of beginner mistakes in the following. A combination of being notoriously spread thin as a founder/CTO who still does client work, a whole bunch of naivety and just a lack of understanding of AWS on my part make some of the below super-obvious in hindsight. While our engineers are some of the smartest in the AWS field (our clients' words, not just mine), my personal experience with AWS is limited to some basic stuff. Admitting to and owning up to my mistakes here has been hard on me, but in my opinion that's the price I need to pay for this whole episode.

What happened?

At OlinData we have been searching for new engineers to hire as our business is growing. Since a lot of our focus is on AWS at the moment, a part of the hiring process is to give applicants a tech assessment which gives them three hours to complete a number of tasks on AWS. As part of this they need some kind of access. I created a separate AWS account for this and hooked it up to consolidated billing on our main account. I was under the impression I did a smart thing by locking people down to their own sub-account.

Part of the deliverables of the assignment can be some kind of IaC (Infrastructure as Code) like for instance Terraform or Ansible - whatever they prefer.

Not all of our applicants come in with a ton of AWS experience and it's not necessarily the only thing that will get them hired (we also do non-AWS work). As a result much of the Terraform / CloudFormation / Ansible code we see is subpar at best, which is exactly what we want to test with this assessment.

One of our applicants posted a Terraform repo on April 8th on a public GitHub repository in order to hand in his assignment, but the repo had AWS credentials in it. An obvious no-no, it was caught by one of our engineers doing the reviewing. At that moment I should have immediately checked if they were compromised (ie. used to create resources or other things) and disabled them, but instead I did not do anything (read: it was during the weekend and I had one other assessment that day and a job interview all while trying to enjoy my weekend with my family). Some more applicants took the assessment and on April 14th at 11pm after logging into the AWS Console and doing some work I noticed the following situation:

90K USD in Cost Explorer
Quite the surprise..

My heart jumped (as it does when you first see these kind of enormous mistakes) and after exploring this in more detail I found out this was indeed very much real.

Most likely the credentials were scraped off of GitHub by hackers who used it to spin up 20 c4.8xlarge instances in every AWS region worldwide. At it’s peak that ran 20k USD per day.

How did we respond?

My first response was to close down the account as it was only used for these tech assessments and doesn’t contain anything of value. I was under the impression that closing the account would terminate all resources and remove/delete everything we were paying for. I later found out through AWS support that this is not necessarily true: “When closing the account, not all resources are terminated upon closing the account.”

Next, I opened a support case from the main OlinData account to notify AWS of the abuse and had a subsequent initial call. Simultaneously I googled around what to do in this case and called one of our most senior AWS engineers out of bed.

AWS support told me I needed to start a support case from the compromised account, not the parent account. Apparently even after closing an account you can still log into it and access the support portal. I started the case from the compromised account, asked to have it reinstated and went through and deleted the offending resources.

We currently have a support case open with Amazon Web Services to get the charges removed from our account. We’ll know more soon.

What actions have we taken?

As with any good post-mortem, action needs to be taken to prevent this from happening again. The following is a list of actions taken and about to be taken:

  • Billing alerts have been set up

  • We started using AWS organizations with Service Control Policies (SCP) (http://docs.aws.amazon.com/organizations/latest/userguide/orgs_manage_policies_about-scps.html). With this we lock down services that can be used from our test account(s) to only some of the core AWS services.

  • Deprecated the old tech assessment account and created a new one. In the new one, created two User Groups: one with an IAM profile called ‘Applicants’ attached to it and another with an IAM policy called ‘admins’. These profiles lock down permissions further than the SCP on Organizations already does.

  • Each new applicant gets a new IAM user that goes in the applicants group and as such gets their permissions locked down

  • Added a clear note to the tech assessment text explaining that posting credentials online will immediately disqualify a candidate. That should deter people who don’t necessarily have much experience from doing so in the future.

What actions will we take in the future?

The above is a good step in the right direction, but more can be done:

  • Lock down the IAM policy for applicants so they get locked into a single region and limited to only free instance sizes

  • Start using CloudCustodian: https://github.com/capitalone/cloud-custodian

  • Add monitoring for unwanted activity, possibly through CloudTrail

  • Look into AWS Config Rules

What could we have done better in response?

  • Immediately open the support-case from the correct account instead

  • Not close down the account that was affected, this turned out to do more harm than good

  • The moment we found out the credentials were public, assume they were compromised and act accordingly

  • One of our engineers suggested snapshotting at least one instance so we could later analyse the AMI to see what these machines were used for

  • Record details on one or more of the instances that were brought up for later analysis

Final thoughts

  • There’s lots of advice on security measures, do _not_ ignore it. Check for instance https://www.slideshare.net/JulienSIMON5/aws-security-best-practices-march-2017

  • AWS organizations is a nice feature, but it’s complicated. We ended up not using it to set fine-grained permissions but only to determine which services are available to the accounts belonging to a specific Organizational Unit (OU). Each AWS account in an OU is then responsible for IAM policies that restrict users and groups further.

  • As much as it is possible to prevent this kind of thing from happening, I’m a bit disappointed in the amount of custom effort we need to put in to accomplish this.