Share on facebook
Share on twitter
Share on linkedin

How long does AWS Lambda keep your idle functions around before a cold start?

A Cloud Guru News
A Cloud Guru News

Using AWS Step Function to find the longest time your AWS Lambda function can idle before the resources are reclaimed

In a recent experiment, I compared the cold start times of AWS Lambda using different languages, memory allocation, and sizes of deployment package.

A cold start occurs when an AWS Lambda function is invoked after not being used for an extended period of time resulting in increased invocation latency.

One of the interesting observations was that functions are no longer recycled after 5 minutes of inactivity — which makes cold starts far less punishing.

How does language, memory and package size affect cold starts of AWS Lambda?Comparing the cold start times of AWS Lambda using different languages, memory allocation, and sizes of deployment—read.acloud.guru

During the experiment, some of my functions didn’t experience a cold start until after 30 minutes of idle time. The longer period of inactivity is something Amazon quietly changed behind the scene — and is fantastic news. However, the change prompted me to ask a few follow-up questions:

  1. what’s the new period of inactivity which triggers a cold start?
  2. does memory allocation impact the idle time before a cold start?

To satisfy my curiosity, I devised an experiment and hypotheses. The experiment is intended to help us glimpse into implementation details of the AWS Lambda platform.

Since AWS can — and will — change these implementation details without notice, you shouldn’t build your application with these results!

Hypotheses 1

There is an upper bound to how long Lambda allows your function to stay idle before reclaiming the associated resources

This should be a given — it simply wouldn’t make any sense for AWS to keep idle functions around forever. Idle functions occupy resources that can be used to help other AWS customers scale up to meet their needs. And most importantly to AWS, an inactive function is not paying the bills.

Hypothesis 2

The idle timeout is not a constant

From a developer’s point-of-view, a consistent and published idle period before a cold start would be preferred — e.g. functions are always terminated after X mins of inactivity.

However, AWS will most likely vary the timeout to optimize for higher utilization periods. This allows them to keep the performance levels more evenly distributed across its fleet of physical servers. For example, if there’s an elevated level of resource contention in a region — it makes sense for AWS to reduce the cold start period and terminate functions to free up resources.

Hypothesis 3

The upper bound for inactivity varies by memory allocation

An idle function with 1536 MB of memory allocation is wasting a lot more resource than an idle function with 128 MB of memory. It makes sense for AWS to terminate idle functions with higher memory allocation earlier.

Experimenting to find the upper bound for inactivity

To find the upper bound for inactivity, we first need to create a Lambda function to act as the system-under-test to report when it has experienced a cold start.

We’ll then need a mechanism to progressively increase the interval between invocations until we arrive at a place where each invocation is guaranteed to be a cold start — the upper bound. The value of the upper bound is determined when ten (10) consecutive cold starts are observed after being invoked X minutes apart.

To answer hypothesis 3 — the impact of memory — we will also replicate the system-under-test function with different memory allocations.

This experiment is a time consuming process, it requires discipline and a degree of precision in timing. Suffice to say I won’t be doing this by hand!

Step Functions to setup the experiment

My first approach was to use a CloudWatch Schedule to trigger the system-under-test function, and let the function dynamically adjust the schedule based on whether it’s experienced a cold start.

This approach failed miserably. Whenever the system-under-test updates the schedule, it fires shortly thereafter instead of waiting for the newly specified interval.

Instead, I turned to Step Functions for help.

AWS Step Functions allows you to create a state machine where you can invoke Lambda functions, wait for a specified amount of time, execute parallel tasks, retry, catch errors, and much more.

Below is the state machine used to carry out this experiment. The visual workflow depicts how the FindIdleTimeout state will invoke the system-under-test function. Depending on its output, it either completes the experiment or waits before recursing.

AWS Step Functions Visual Workflow

The wait state allows you to drive the number of seconds to wait using data — see theSecondsPath parameter in the documentation for more details. The wait state allowed me to start the state machine with an input like this:

{ 
    “target”: “when-will-i-coldstart-dev-system-under-test-128”, 
    “interval”: 600, 
    “coldstarts”: 0 
}
  • The input is then passed to another find-idle-timeout function as invocation event.
  • The function will invoke the target — which is one of the variants of the system-under-test function with different memory allocations
  • It will then increase the interval if the system-under-test function doesn’t report a cold start.
  • Finally, the find-idle-timeout function will return a new piece of data for the Step Function execution
{ 
    “target”: “when-will-i-coldstart-dev-system-under-test-128”, 
    “interval”: 660, 
    “coldstarts”: 0 
}
  • At this point, the wait state will use the interval value and wait 660 seconds before switching back to the FindIdleTimeout state.
  • It will then invoke the find-idle-timeout function again — using the previous output as input.
"Wait": {
    "Type": "Wait",
    "SecondsPath": "$.interval",
    "Next": "FindIdleTimeout"
},

With this setup I’m able to kick off multiple executions — one for each memory setting. Using the Steps Functions dashboard, you can observe the active executions for your state machine.

The Steps Functions Dashboard

Along the way I have plenty of visibility into what’s happening, all from the comfort of the Step Functions management console. Using the Step Functions console, you can see the current state of the state machine.

The Step Functions Console

Using the Step Functions Console, you can also see the input and current output of the state machine.

Execution Details: Input

The first time the target function is invoked, it is guaranteed to be a cold start. Here you can see the current cold start count is one (1).

Execution Details: Output

Using the Step Functions Console, you can also see when the state transitions happened — and the relevant inputs and outputs at each transition.

State Transitions

The Results!

From the data, it’s clear that AWS Lambda shuts down idle functions around the hour mark. It’s interesting to note that the function with 1536 MB memory is terminated over 10 mins earlierThis finding supports hypothesis 3 — idle functions with higher memory allocation will be terminated earlier.

To help analyze the results, I collected data on all the idle intervals where we saw a cold start and categorized them into 5 minute brackets.

This table shows the number of cold starts that occurred before each function reached its upper bound idle time

From this chart, you can see that over 60% of cold starts occurred after 45 mins — before the functions reached their upper bound for inactivity.

Even though the data is seriously lacking, the little data collected still allows us to observe some high level trends:

  • over 60% of cold starts happened after 45 mins of inactivity — prior to hitting the upper bound
  • the function with 1536 MB memory sees significantly fewer number of cold starts prior to hitting the upper bound
  • it’s worth noting that functions with 1536 MB also have a lower upper bound (48 mins) when compared to other functions

The data seems to clearly supports hypothesis 2 — that the idle timeout is not a constant. There’s no way for us to figure out the reason behind these cold starts, or if there’s significance to the 45 mins barrier.

Conclusions

AWS Lambda will generally terminate functions after 45–60 mins of inactivity, although idle functions can sometimes be terminated a lot earlier to free up resources needed by other customers.

I hope you found this experiment interesting — it’s meant for fun and to satisfy a curious mind — and nothing more! Please don’t build applications on the assumptions these results are valid, or assume they will remain valid for the foreseeable future. You can find the source code for the experiment here.

theburningmonk/lambda-when-will-i-coldstartlambda-when-will-i-coldstart – Experiment to find out how long your function would need to be idle for for it to be—github.com

While I answered a few questions, the results from this experiment also deserve further investigation. For instance, the 1536 MB function exhibited very different behaviour compared to other functions. Is this a special case, or do functions with more than 1024 MB of memory all share these traits?

I’d love to find out. Maybe I’ll write a follow up to this experiment in the future. Watch this space 😉


Thanks for reading! If you like what you read, hit the ❤ button below so that others may find this. You can follow me on Twitter.

Recommended

Get more insights, news, and assorted awesomeness around all things cloud learning.

Get Started
Who’s going to be learning?
Sign In
Welcome Back!
Thanks for reaching out!

You’ll hear from us shortly. In the meantime, why not check out what our customers have to say about ACG?

How many seats do you need?

  • $499 USD per seat per year
  • Billed Annually
  • Renews in 12 months

Ready to accelerate learning?

For over 25 licenses, a member of our sales team will walk you through a custom tailored solution for your business.


$2,495.00

Checkout