Infrastructure as Code, not Configuration
By: Matt Jachowski
When my co-founder and I started our company, I immediately offered to be the “infrastructure guy”. In retrospect, it’s amusing how much I underestimated the difficulty of that task. We were setting out to build a cloud-based developer tool in AWS. I had barely used AWS once before, during an internship in college, over a decade ago. But, I knew a lot of people used AWS, so it couldn’t be that hard. And, I had spent the past 10 years writing and optimizing high performance C++ code, which definitely felt more hardcore.
More than a year after launch, I can confidently say that I was wrong. Building complex cloud infrastructure with no prior experience is hard. As powerful and widely used as AWS is, it has a steep learning curve. I started from zero, and it took many many long hours to feel any level of competence. But, I eventually figured it out. Today, our infrastructure is composed of an alphabet soup of AWS resources including Cloudwatch, Cloudtrail, Cloudfront, Route53, VPCs, ELBs, RDS, DocumentDB, Lambdas, Cognito, EC2, EBS, EFS, AMIs, ECS, ECR, S3, SNS, and SQS.
Most significantly, I think we did it right. No one on our small team spends all of their time on infrastructure and no one is afraid to touch the infrastructure code. Any developer can deploy a full, independent copy of our infrastructure with a single command. We make a complete infrastructure deployment to test each pull request, update our test deployment on every merge to our main branch, and update our production deployment every day. No single piece of infrastructure is sacred and each deployment wholesale replaces most of our resources. We experiment, iterate quickly, and test new infrastructure code in isolation without fear of downtime.
This will come as no surprise to any infrastructure engineer, but the key to all of this is Infrastructure as Code (IAC). AWS documentation gets you started with their web-based consoles, but as soon as things get complex, IAC is the way to go. I initially evaluated CloudFormation and Terraform, and chose CloudFormation because I guessed that we would be AWS-only for at least a while. My initial impression with CloudFormation wasn’t great. It worked and it exposed most of the parameters I needed, but I found it tedious and error prone to write raw YAML templates. This was fine for small prototypes, but quickly became untenable as our infrastructure grew and templates grew longer and more complex.
Resources: DefaultTargetGroup: Properties: Port: '80' Protocol: HTTP TargetType: ip VpcId: !ImportValue 'conducto-demo-vpc-VPC' Type: AWS::ElasticLoadBalancingV2::TargetGroup LoadBalancer: Properties: Scheme: internet-facing SecurityGroups: - !Ref 'LoadBalancerSecurityGroup' Subnets: - !ImportValue 'conducto-demo-vpc-PublicSubnet0' - !ImportValue 'conducto-demo-vpc-PublicSubnet1' Type: AWS::ElasticLoadBalancingV2::LoadBalancer ...
A snippet of CloudFormation YAML. Looks reasonable, until you end up with thousands of lines of it.
Then, I discovered Troposphere, which was a revelation. It’s a thin Python wrapper over raw CloudFormation. To the uninitiated, that might seem to be a small thing. But, being able to use Python, or any other programming language, instead of YAML is a game changer. You get code completion, type checking, debugging features, and the ability to use normal programming language constructs like for loops, functions, and variables in a familiar way. This is actually Infrastructure as Code.
def gen_default_target_group(self): self.default_target_group = elasticloadbalancingv2.TargetGroup( "DefaultTargetGroup", VpcId=self.import_value("vpc", "VPC"), Port="80", TargetType="ip", Protocol="HTTP", ) self.template.add_resource(self.default_target_group) def gen_load_balancer(self): self.load_balancer = elasticloadbalancingv2.LoadBalancer( "LoadBalancer", SecurityGroups=[Ref(self.load_balancer_security_group)], Subnets=[ self.import_value("vpc", "PublicSubnet0"), self.import_value("vpc", "PublicSubnet1"), ], Scheme="internet-facing", ) self.template.add_resource(self.load_balancer) value = GetAtt(self.load_balancer, "DNSName") self.export_value(value, "DNSName")
A snippet of Cloudformation in Python with Troposphere. Despite the surface similarities to the YAML, Python is a huge improvement.
I’m very happy with Troposphere. If I was starting over today, I would also evaluate Pulumi and the AWS CDK, two newer IAC solutions that actually use code too. Our tool, Conducto, has a similar flavor, allowing developers to specify complex DevOps and data science pipelines in Python, instead of YAML or GUIs, like most of the competing tools. I hope more developer tools move in this direction.