bridge
Share on facebook
Share on twitter
Share on linkedin

CloudFormation, Terraform, or CDK? A guide to IaC on AWS

Jared Short
Jared Short

I’ve been working with Amazon Web Services for 6 years or so, and though the cloud has changed a lot in that time, one thing has remained consistent: Infrastructure as Code (IaC) is a core pillar of a healthy implementation of AWS.

For anything bigger than a toy cloud application, IaC is table stakes. You’d be hard-pressed to find someone managing anything of scale who thinks letting folks point and click in the console is the optimal route.

These days, I actually find it faster to just start with all of my applications or even proof-of-concept with an IaC tool and go from there. I, time and time again, have found it easier to return to projects weeks or months later and quickly be able to understand how things work from a familiar baseline and context. I don’t have to rebuild in my mind exactly what I was thinking from scratch.

The “how” of how one approaches IaC is, of course, AWS engineers’ very own version of the old “tabs vs spaces” debate.

So what IAC tools are available to you in AWS, and how do you choose between them?

AWS CloudFormation

CloudFormation (CFN) is the original IaC tool for AWS, released in 2011. I have come to respect, hate, love, and revere its power to describe and manage infrastructure. CFN was originally only offered in JSON, but we were finally treated to a heaping helping of tabs vs spaces actually mattering with native CFN YAML support in 2016.

CloudFormation is one of the safest ways to build, manage, change, and destroy resources in your infrastructure. It offers robust resource state management, and these days it can tell you what is going to happen before you run your deployment.

A lot of great features have worked to make CloudFormation more enjoyable or productive to work with over the years.

CloudFormation Macros & Transforms

One of the more powerful concepts, bringing whole new capabilities to essentially add your own opinionated capabilities to CloudFormation. For example, Trek10 provides a macro that lets you write Step Function Amazon States Language (ASL) with pure yaml and smartly resolves CFN intrinsic functions.

You could imagine being able to provide opinionated IAM policy generators or S3 bucket resource macros. Whatever you want to do, macros can likely get you there. Take note, while powerful, you are treading dangerous territory as it becomes easy to effectively build your own Domain Specific Language. Instead of CloudFormation managing your resources, you are using CloudFormation as a bad Domain-Specific Language compiler that you have to babysit.

Resource Providers

For a while, we only had Custom Resources to provision and manage resources that CloudFormation didn’t natively support. This is now largely superseded by Resource Providers which allow you to create private or published providers to bring the management of third party and unsupported resources into your stacks. For example, Datadog, a popular monitoring tool can be used in your stack to provision and manage your monitoring without needing some out-of-band process.

In most of my recent work with CFN, I’ve defaulted to using the AWS Serverless Application Model, or SAM. SAM is a superset of CFN, with some handy transformations that let you do a bit less typing and wiring up of various resources and permissions. Think of it like a well thought out and “managed” macro.If you are doing anything with AWS Lambda or event-driven computing and looking to level up your YAML wrangling, start with SAM.

AWS Cloud Development Kit

AWS Cloud Development Kit (CDK) is the new kid on the block, released in 2019. Using familiar programming languages and provided libraries in TypeScript, Python, Java and .NET developers can write with the same code as the rest of their stack to manage their infrastructure.

CDK, however, is not devoid of CloudFormation. In fact, CDK synthesizes to CloudFormation. You still leverage all the state management and inherent benefits (and downsides) of CloudFormation by adopting CDK.

A quick aside: I do want to highlight that some folks view CFN as the “assembly language” of AWS, largely because of how many tools “compile” down to CFN. I think this is a dangerous comparison. It can lead to the interpretation that, like any high-level language to assembly, you don’t really need to understand how the lower-level instruction set works to effectively leverage the higher-level constructs. In my experience, this is patently untrue in the case of CFN. Even a rudimentary understanding of CFN leads to better decisions in the higher level usages like CDK.

Ultimately, I would contend that CDK is the most comfortable and natural entry point for developers to start building Cloud Native applications

Constructs

One of the particularly powerful features of CDK that I believe CloudFormation has struggled to natively deliver is the idea of truly shareable and reusable modules. CDK has introduced the concept of constructs. Constructs in practice provide everything from simple wrappings of some specific defaults you would like to re-use across your project all the way to complex multi-resource orchestration and wrapping of resource providers. The distribution method for these constructs then relies on the native

The other important part of CDK Constructs is something neat called jsii. To quote the project; “jsii allows code in any language to naturally interact with JavaScript classes. It is the technology that enables the AWS Cloud Development Kit to deliver polyglot libraries from a single codebase!” If you write your constructs with TypeScript, it is fairly straight forward to distribute and utilize those constructs across the other core CDK languages – further encouraging sharing of modules.

One of the most elegant ways I can illustrate how nice the CDK experience can be is to put a side-by-side comparison of the usage of Amazon States Language (ASL).

First what it looks like in CloudFormation Native ASL:

{
  "DeliveryStepFunctionStateMachine": {
    "Type": "AWS::StepFunctions::StateMachine",
    "Properties": {
      "RoleArn": {
        "Fn::GetAtt": ["DeliveryStepFunctionStateMachineRoleC6479370", "Arn"]
      },
      "DefinitionString": {
        "Fn::Join": [
          "",
          [
            "{\"StartAt\":\"MapperTask\",\"States\":{\"MapperTask\":{\"Next\":\"SetStatusTo-pending\",\"Retry\":[{\"ErrorEquals\":[\"States.ALL\"],\"MaxAttempts\":10}],\"Parameters\":{\"FunctionName\":\"",
            {
              "Ref": "DeliveryStepFunctionMapper"
            },
            "\",\"Payload.$\":\"$\"},\"OutputPath\":\"$.Payload\",\"Type\":\"Task\",\"Resource\":\"arn:",
            {
              "Ref": "AWS::Partition"
            },
            ":states:::lambda:invoke\"},\"SetStatusTo-pending\":{\"Next\":\"retry seconds\",\"Type\":\"Task\",\"ResultPath\":null,\"Resource\":\"arn:",
            {
              "Ref": "AWS::Partition"
            },
            ":states:::dynamodb:updateItem\",\"Parameters\":{\"Key\":{\"pk\":{\"S.$\":\"$.pk\"},\"sk\":{\"S.$\":\"$.sk\"}},\"TableName\":\"",
            {
              "Ref": "PersistenceDDBTable"
            },
            "\",\"ExpressionAttributeNames\":{\"#status\":\"status\"},\"ExpressionAttributeValues\":{\":status\":{\"S\":\"pending\"}},\"ReturnValues\":\"ALL_NEW\",\"UpdateExpression\":\"SET #status = :status\"}},\"retry seconds\":{\"Type\":\"Wait\",\"SecondsPath\":\"$.retrySeconds\",\"Next\":\"SetStatusTo-in-progress\"},\"SetStatusTo-in-progress\":{\"Next\":\"DeliverTransactionTask\",\"Type\":\"Task\",\"ResultPath\":null,\"Resource\":\"arn:",
            {
              "Ref": "AWS::Partition"
            },
            ":states:::dynamodb:updateItem\",\"Parameters\":{\"Key\":{\"pk\":{\"S.$\":\"$.pk\"},\"sk\":{\"S.$\":\"$.sk\"}},\"TableName\":\"",
            {
              "Ref": "PersistenceDDBTable"
            },
            "\",\"ExpressionAttributeNames\":{\"#status\":\"status\"},\"ExpressionAttributeValues\":{\":status\":{\"S\":\"in-progress\"}},\"ReturnValues\":\"ALL_NEW\",\"UpdateExpression\":\"SET #status = :status\"}},\"DeliverTransactionTask\":{\"Next\":\"Delivery success?\",\"Retry\":[{\"ErrorEquals\":[\"States.ALL\"],\"MaxAttempts\":10}],\"Parameters\":{\"FunctionName\":\"",
            {
              "Ref": "DeliveryStepFunctionDeliverTransaction"
            },
            "\",\"Payload.$\":\"$\"},\"OutputPath\":\"$.Payload\",\"Type\":\"Task\",\"Resource\":\"arn:",
            {
              "Ref": "AWS::Partition"
            },
            ":states:::lambda:invoke\"},\"Delivery success?\":{\"Type\":\"Choice\",\"Choices\":[{\"Variable\":\"$.status\",\"StringEquals\":\"complete\",\"Next\":\"SetStatusTo-complete\"},{\"Variable\":\"$.status\",\"StringEquals\":\"failed\",\"Next\":\"SetStatusTo-failed\"}],\"Default\":\"SetStatusTo-pending\"},\"SetStatusTo-complete\":{\"End\":true,\"Type\":\"Task\",\"ResultPath\":null,\"Resource\":\"arn:",
            {
              "Ref": "AWS::Partition"
            },
            ":states:::dynamodb:updateItem\",\"Parameters\":{\"Key\":{\"pk\":{\"S.$\":\"$.pk\"},\"sk\":{\"S.$\":\"$.sk\"}},\"TableName\":\"",
            {
              "Ref": "PersistenceDDBTable"
            },
            "\",\"ExpressionAttributeNames\":{\"#status\":\"status\"},\"ExpressionAttributeValues\":{\":status\":{\"S\":\"complete\"}},\"ReturnValues\":\"ALL_NEW\",\"UpdateExpression\":\"SET #status = :status\"}},\"SetStatusTo-failed\":{\"End\":true,\"Type\":\"Task\",\"ResultPath\":null,\"Resource\":\"arn:",
            {
              "Ref": "AWS::Partition"
            },
            ":states:::dynamodb:updateItem\",\"Parameters\":{\"Key\":{\"pk\":{\"S.$\":\"$.pk\"},\"sk\":{\"S.$\":\"$.sk\"}},\"TableName\":\"",
            {
              "Ref": "PersistenceDDBTable"
            },
            "\",\"ExpressionAttributeNames\":{\"#status\":\"status\"},\"ExpressionAttributeValues\":{\":status\":{\"S\":\"failed\"}},\"ReturnValues\":\"ALL_NEW\",\"UpdateExpression\":\"SET #status = :status\"}}}}"
          ]
        ]
      }
    }
  }
}

Then with CDK (leveraging some existing constructs to handle editing the Amazon DynamoDB records for me):

const STATUS = "$.status"
const RETRY_SECONDS = "$.retrySeconds"
const PENDING = "pending"
const PROGRESS = "in-progress"
const FAILED = "failed"
const COMPLETE = "complete"

const setPending = stepFunction.setStatus(this, props.table, PENDING);
const setProgress = stepFunction.setStatus(this, props.table, PROGRESS);
const setSuccess = stepFunction.setStatus(this, props.table, COMPLETE);
const setFailed = stepFunction.setStatus(this, props.table, FAILED);
const waitForNSeconds = this.waitTask("retry seconds", RETRY_SECONDS);

const definition = this.mapperTask()
  .next(setPending)
  .next(waitForNSeconds)
  .next(setProgress)
  .next(this.deliverTransactionTask())
  .next(
    new sfn.Choice(this, "Delivery success?")
      .when(sfn.Condition.stringEquals(STATUS, COMPLETE), setComplete)
      .when(sfn.Condition.stringEquals(STATUS, FAILED), setFailed)
      .otherwise(setPending)
  );

If you had to read the second code snippet to understand what the first was doing, I’d completely understand. Granted, there is nothing stopping CloudFormation from adopting and supporting a more elegant DSL. In fact, AWS SAM is really an attempt at exactly this with a focus on the serverless developer experience. 

Given the current community momentum around CDK and growing investment from AWS, I expect to see more and more teams starting with CDK and happily continuing with it as their primary utility for infrastructure management.

Terraform

Terraform was introduced in 2014, with the goal of being able to orchestrate infrastructure as code. It first targeted AWS but has grown to be able to manage a large ecosystem of modules.  In fact, the capability of multi-provider support is one of the main selling points of the technology.

Terraform introduced its own DSL, called Hashicorp Configuration Language (HCL). On the surface, it feels like a more human-friendly JSON. JSON is also natively supported within Terraform if you have a masochistic side.

Infrastructure as Code is just fancy state management

The biggest difference between Terraform and CloudFormation is how it actually interacts with the infrastructure itself. CloudFormation you can hand a representation of your goal state, and it will perform all the operations on your infrastructure to get there for you natively within the platform. Terraform likewise takes the representation of your goal state, and constructs a plan of API calls directly to your infrastructure to get to that state.

In a perfect world, both approaches work flawlessly. But this is the cloud we are talking about. Everything fails all the time as Werner Vogels says. Until recently, Terraform was superior in terms of being able to recover from people going outside the process to update resources.

Terraform was able to resolve inconsistencies and refresh a correct state of the infrastructure even if someone had manually edited that security group “just to test something”. CloudFormation struggled with these inconsistent states, but the introduction of Drift Detection attempted to solve some of this headache.

Even if you didn’t start with IaC, you can move to it

Terraform also still offers the more elegant story of importing unmanaged resources, or resources from other stacks. CloudFormation offers this, but only for the subset of resources that support drift detection.

In addition to these benefits, Terraform is really the one true option for “learn once, utilize most places”. Regardless of your feelings on multi-cloud or hybrid-cloud, the appeal of training up yourself or staff on a singular technology that can benefit from knowledge transfer across many different possible targets is tempting.

It’s no longer one or the other

Recently CDK has introduced the Cloud Development Kit for Terraform. Effectively it allows developers to write CDK that under the hood targets Terraform instead of CloudFormation. This is the closest we can get in the cloud world to having our cake and eating it, as you could imagine a CDK application that uses CloudFormation for your AWS stack targets and Terraform for external provider stack targets.

Others

The IaC space is growing, everyone has their own opinion and how things should work. I’d argue competition is healthy and in some cases has forced the providers themselves to step their game up.

AWS Amplify CLIA CLI toolchain for simplifying serverless web and mobile development. If you are primarily a frontend developer, or just want to get going as fast as possible, look no further. The Amplify CLI and framework manages all the complexity behind the scenes to help you build and deploy real-time web and mobile applications.
PulumiIf the Terraform and CDK teams got together and reimagined things, I get the sense it would look a bit like Pulumi.
TroposphereThe troposphere library allows for easier creation of the AWS CloudFormation JSON by writing Python code to describe the AWS resources. Troposphere also includes some basic support for OpenStack resources via Heat.
InGraphInGraph is an open-source and declarative, infrastructure graph DSL for AWS CloudFormation. The key feature is the ability to create composable infrastructure components while preserving the rigorous semantic of the AWS CloudFormation language.
Serverless FrameworkZero-friction serverless development. Easily build apps that auto-scale on low cost, next-gen cloud infrastructure.

Which one you should use

Given the vast amount of choices and business requirements that are out there, it’s irresponsible to levy a one-size-fits-all opinion in a 1600-word article. Rather, I’d approach it with a series of questions to ask yourself when considering your options.

Am I working on a simple, mostly serverless solution with minimal dependency or dependents? CloudFormation (particularly AWS SAM) is likely enough.
Do I have a top-down distribution of best practices and orchestration?CDK or Terraform
Do I want to stay entirely within the AWS ecosystem?CloudFormation or CDK
Do I need to orchestrate resources outside the AWS ecosystem?Terraform or CDK for Terraform
Do I want a multi-provider utility, especially for multi/hybrid cloud knowledge transfer?Terraform
Choosing the right IaC tool on AWS

The only truly wrong answer is the one that prevents you from building anything at all.

Trek10 is an AWS Premier Consulting Partner focusing on cloud-native and serverless applications.

Recommended

Get more insights, news, and assorted awesomeness around all things cloud learning.

Get Started
Who’s going to be learning?
Sign In
Welcome Back!
Thanks for reaching out!

You’ll hear from us shortly. In the meantime, why not check out what our customers have to say about ACG?

How many seats do you need?

  • $499 $399 USD per seat per year
  • Billed Annually
  • Renews in 12 months

Ready to accelerate learning?

For over 25 licenses, a member of our sales team will walk you through a custom tailored solution for your business.


$1,995.00

Checkout