Share on facebook
Share on twitter
Share on linkedin

The challenges of blue/green deployment with AWS Lambda and CloudFormation

A Cloud Guru News
A Cloud Guru News

I’ve been thinking a lot about how I want serverless code and infrastructure evolution to work

I’ve been thinking a lot about deployment of Lambda and CloudFormation. The addition of weighted aliases for Lambda and canarying for API Gateway means that phased rollouts of code that look “in-place” using the same endpoint are now possible — usually called blue/green deployments.

One of the consequences of this approach is that your application has more than one version of a function’s code active at one time.

Fundamentally, versioned Lambda code doesn’t feel like it belongs in CloudFormation. In the end, the set of versions of a Lambda function is probably going to look like a replica of your repository history.

You only want the in use code — your active branches — to be present in the managed infrastructure for application using CloudFormation. So I got to thinking how functions could be published outside of CloudFormation while somehow still linked to the resources inside the template.

Except … function configuration is slowly changing which feels maybe like it belongs in CloudFormation. Configuration specifies the IAM role which should be in CloudFormation — and the environment variables may reference other CloudFormation resources.

However, the configuration is versioned with the code, so deploying code outside of CloudFormation would have to pull in values from a stack. The stack (i.e., the weighted aliases) would then also need to reference the Lambda version created by that deployed code.

So it weaves in and out of CloudFormation. Not great.

I created a diagram below that depicts the challenge. It’s actually annoying to diagram concisely because it’s all about how it evolves over time. 😒

The ever-growing list of Lambda versions doesn’t seem like it belongs in CloudFormation

The ever-growing list of Lambda versions doesn’t seem like it belongs in CloudFormation because the template then just grows and grows.The function configuration is versioned along with the code and therefore would be deployed outside CloudFormation — but it’s slowly changing. The functions also references resources that live in CloudFormation. Both of these are clear indicators it should live within the template.

Once I figure out the desired model — whatever it is — ’ll create custom resources to accomplish it — but I hope it would be something that could work for SAM in the future.

My current thoughts on this challenge:

There would be a resource that represents the Lambda function existence: it creates the name (and only by necessity, deploys non-functional code as a placeholder). Unlike the existing AWS::Lambda::Function resource, it would not take any properties other than (optionally) FunctionName. I’ll call this resource type AWS::Lambda::Placeholder.

There would be a resource type for deploying a version. It would take all the properties of AWS::Lambda::Function, with two changes:

  1. FunctionName would be required for where you would put a reference to the Placeholder resource
  2. CodeSha256 would be available but optional, to allow for the prevention of race conditions like the existing AWS::Lambda::Version resource)

I’ll call this resource type AWS::Lambda::Deployment. It would return the version ARN when ref’d in the template — and probably the version number as an attribute.

Unlike AWS::Lambda::Version, AWS::Lambda::Deployment would be updatable for any field. It would cause a new version to be published. There would cease to be any reference in the stack to the version that it previously deployed, but that version would not actually be removed.

Multiple Deployment Resources
You could have multiple Deployment resources, for each of the places in the template you’re referencing a version — each of the versions that has a connection in the above diagram.

It’d be useful to have a way to reference an existing version of a Lambda function, so that if a previous version needs to be referenced again, it can just get dropped in. Maybe AWS::Lambda::ExistingDeployment.

Using the original diagram and annotating the AWS::Lambda::Deployment resources in blue, the evolution would look something like this:

Step 1: We’ve just deployed the very first version:

Step 1. Deployment the first version

Step 2: Now suppose we’ve just updated the configuration to give the Lambda function more memory. This wouldn’t require us to use the weighted alias to roll out — we’d just update the Deployment resource.

The weighted alias resource would get updated with the new version reference. Function version v1 would no longer appears anywhere in the stack — but it still exists.

Step 2: Weighted alias updated with new version reference

Step 3: Now that we’ve got a code update, we want to roll it out using the weighted alias. We’d add a new Deployment resource to deploy the new code specifying the function configuration as well — even if it hasn’t changed — and point the weighted alias at it. We’d keep both Deployment resources around while we roll out since both are referenced in the stack.

Step 3: Rollout using the weighted alias

Step 4: Once the rollout is complete, we can update the stack to remove the older Deployment resource and update the weighted alias to remove its reference to that Deployment.

Step 4: Update the stack to remove the older Deployment resource

Using this approach, here’s the updated diagram indicating the usage of Deployment resources.

Updated diagram showing usage of Deployment resources

For more concise templates, this option would be to provide a separate Configuration resource. It wouldn’t actually do anything itself — but could be referenced by multiple deployments.

What I am modeling is based on a notional system where, using your infrastructure graph such as CloudFormation, you don’t have functions as a first class concept. Instead, imagine that you are telling your event sources “execute this code artifact (from S3), with this IAM role and this configuration”.

In your infrastructure graph, you can create an edge between two nodes — the event source and function code. The edge, a first class concept, contains the configuration for the code to invoke.

This isn’t the future I necessarily want, but I think it’s useful directionally to think about ephemeral compute as where your business logic fits into the system to glue it together.

Drop your thoughts in the comments below or connect with me on twitter — I’d be interested in your thoughts on the challenges and approaches with blue/green deployments using Lambda and CloudFormation.

Get Started
Who’s going to be learning?
Sign In
Welcome Back!

Psst…this one if you’ve been moved to ACG!