Consider the following scenario:
- A DevOps person runs a CloudFormation template to create a stack. Let's say it's a network stack that defines all the traffic that can go in and out of your network.
- A nefarious or ill-advised person logs into the AWS console and manually changes the networking rules and opens ports to allow evil traffic. For example perhaps the person creates rules that open ports for WanaCryptor intentionally or unintentionally (though hopefully no one is running SMBv1 on AWS!)
- DevOps person re-runs the networking stack via CloudFormation to restore the network to the desired state.
Does this work?
No.
How Does CloudFormation Know When To Make a Change?
CloudFormation seems to only know about the changes it made and differences in the template that it is running vs. the last template it ran. CloudFormation will compare the old unchanged template and new template and go "Cool, everything's all good. Move on".
I just manually changed an S3 endpoint policy, bucket policies and IAM policies in the console and then re-ran the original CloudFormation stacks and my policy changes remained intact.
How Does This Impact Security of Your AWS Account?
If you have a critical, security related stack and you want to maintain that stack in a secure state, you should structure your account to ONLY allow CloudFormation (if that is your tool of choice) and write security policies to allow the appropriate people to only update these stacks through your well-audited deployment mechanism. You might also be able to use Config rules and other event triggers but that seems more complicated and error prone than a straight-forward process of locking down how things are deployed in your account. If you only find out about a problem AFTER it happened and then fix it, might be too late. I explain this in more detail in this white paper on Event Driven Security Automation.
How Can Manual Problems Be Fixed?
In order to fix this problem a change can be made to the template that forces an update. In the case of my policies, I can alter the policies in my template to force them to be updated, for example by changing the SID. Deleting something from a template, running the template, then recreating can work in some cases. Manually deleting things created outside CloudFormation in the console is an option. However, deleting resources is not an option when you have existing systems running that cannot be taken offline that are using the network you are trying to delete. In fact, if you try to do this your CloudFormation stack may end up in a state that is difficult, if not impossible, to fix - though some new features have been added to CloudFormation to override certain problems. You could create a new network and move resources so the new network so you can delete the old one, but that also can be very complicated.
Recommendation for Deploying Security Controls on AWS
For this reason...I highly recommend that if you use CloudFormation, for critical stacks such as security appliances and networking, make that the only option for deployment and create appropriate stack policies so those stacks cannot be altered to an undesirable state. In fact, I would recommend that in production accounts, only automated processes should be used for deployments. In QA, using only automated deployments ensures your testing is accurate. Using only automated mechanisms in your development environment will ensure your automation works. If you MUST provide manual access, create a sandbox for testing and clicking buttons. You could also find something other than CloudFormation to automate and control how resources are deployed. CloudFormation is not the only way to do this, however it offers a lot of built in security benefits.
CloudFormation Can Improve Security In Your AWS Account, When Used Properly
How Can Manual Problems Be Fixed?
In order to fix this problem a change can be made to the template that forces an update. In the case of my policies, I can alter the policies in my template to force them to be updated, for example by changing the SID. Deleting something from a template, running the template, then recreating can work in some cases. Manually deleting things created outside CloudFormation in the console is an option. However, deleting resources is not an option when you have existing systems running that cannot be taken offline that are using the network you are trying to delete. In fact, if you try to do this your CloudFormation stack may end up in a state that is difficult, if not impossible, to fix - though some new features have been added to CloudFormation to override certain problems. You could create a new network and move resources so the new network so you can delete the old one, but that also can be very complicated.
Recommendation for Deploying Security Controls on AWS
For this reason...I highly recommend that if you use CloudFormation, for critical stacks such as security appliances and networking, make that the only option for deployment and create appropriate stack policies so those stacks cannot be altered to an undesirable state. In fact, I would recommend that in production accounts, only automated processes should be used for deployments. In QA, using only automated deployments ensures your testing is accurate. Using only automated mechanisms in your development environment will ensure your automation works. If you MUST provide manual access, create a sandbox for testing and clicking buttons. You could also find something other than CloudFormation to automate and control how resources are deployed. CloudFormation is not the only way to do this, however it offers a lot of built in security benefits.
CloudFormation Can Improve Security In Your AWS Account, When Used Properly
More in my Secplicity blog post: Benefits of CloudFormation for Secure Cloud Deployments