One of the biggest serverless announcements from AWS re:Invent 2020 is 1 ms billing for AWS Lambda functions. This is an industry-leading change that fundamentally alters the cost of running many serverless applications.
In this post, I explain the impact and show how you can optimize your Lambda functions to take advantage of this new feature. This can help you drive down the cost of running serverless applications even further.
How Lambda billing works
Lambda is an on-demand compute platform – it only runs in response to events, and you only pay when code is running.
There are two dimensions to Lambda pricing – requests and duration. A request happens anytime that a function is invoked and these are charged at $0.20 per million, regardless of function size. Duration is calculated as a factor of time and memory usage at $0.0000166667 per GB-second. There is also a Free Tier allowance of 1 million requests and up to 400,000 GB/s of compute every month. You are not charged for any Lambda usage under this level.
Before this announcement, the minimum duration for billing was 100 ms and the billable duration was always rounded up to the nearest 100 ms. This means that a function with 5 ms was rounded to 100 ms and a function with 385 ms duration was rounded to 400 ms. With the move to 1 ms billing, the smallest duration value is now 1 ms and all other usage is rounded up the next 1 ms.
From 100 ms to 1 ms rounding –the cost difference for existing code
Lambda is designed for very high volumes of traffic and many production-based applications result in millions of invocations. This change in the duration granularity provides cost savings for almost all Lambda functions, and this will automatically appear on your AWS billing statement.
To see the impact of this pricing change for different function durations, I compare three types of Lambda function:
- Short-lived (~100 ms): these typically operate as “glue”, minimally processing data between AWS services.
- Subsecond (under 1 second): for more complicated business logic handling more data.
- Longer-lived functions (under 1 minute): for computationally complex logic, or functions calling out synchronously to third-party services.
The following table takes a random sampling of durations for these three types of functions, then rounds the billable duration by 1 ms and 100 ms granularity:
This shows that the change to billable duration has the most impact on the cost of running short-lived functions, with up to 67% shorter billable durations in this example. For subsecond functions, the example shows a 10% drop in average duration, while it has less impact for longed-lived functions.
In practice, in most production workloads, Lambda functions run for brief amounts of time, so there are good opportunities for optimization here.
Memory, cost, and duration
One of the major configuration levers available to you as a Lambda developer is memory. You can set the allocated memory from 128 MB to 10 GB in 64 MB increments. However, memory also controls the amount of virtual CPU available to your function, ranging from a percentage of a single core at 128 MB to six full virtual CPUs at maximum memory.
For many compute-intensive functions, increasing the memory shortens the overall duration. This means you can achieve a faster function with a negligible impact on cost in many cases. To find the optimal balance for your functions, there are a couple of options. You can manually test a function with different memory allocations to measure the effect on duration, or you can use the AWS Lambda Power Tuning tool.
This tool helps automate finding the balance between memory, duration and cost, and visualizes the results. For example, a compute-intensive function is evaluated at different memory levels. The resulting chart shows it performs fastest at 3008 MB but the lowest cost occurs at 1024 MB and 1536 MB:
When the minimum billing duration was 100 ms, there was no financial gain by optimizing functions that ran for less than 100 ms. While increasing memory may have reduced milliseconds in these cases, the overall cost increased due to the additional memory usage and no change in the duration minimum. However, with 1 ms billing, this all changes – it’s now worth optimizing the efficiency of these functions. There are many steps you can take to optimize these short-running functions.
Reusing TCP connections in Node.js
In Node.js, the default HTTPS agent creates a new TCP connection for every new request. If you are interacting with services like DynamoDB, the latency of this operation can be greater than the time taken for the database request. This latency is also compounded if your service request uses the AWS Key Management Service.
You can change a setting to ensure that the TCP connection is reused by using a flag that was enabled by the AWS SDK from version 2.463.0. The easiest way to apply this is to use an environment variable in your Lambda functions. By setting AWS_NODEJS_CONNECTION_REUSE_ENABLED to 1, warm invocations of the function will then reuse the connection.
This section refers to an example application in this code repo. I compare the latency of putting an item into an Amazon DynamoDB table, with and without the reuse flag. I run a load test on each function for 1000 request over 300 seconds. Using this CloudWatch Insights query, I can then analyze the duration averages and percentile metrics:
filter @type = "REPORT" | stats avg(@duration) as Average, percentile(@duration, 99) as NinetyNinth, percentile(@duration, 95) as NinetyFifth, percentile(@duration, 90) as Ninetieth by bin(60m)
In v1 of the sample code with no reuse flag, these are the results:
AddToDDBfunction: Type: AWS::Serverless::Function Properties: CodeUri: addToDDBfunction/ Handler: app.handler Runtime: nodejs12.x MemorySize: 128 Timeout: 3 Environment: Variables: AWS_NODEJS_CONNECTION_REUSE_ENABLED: 1
Running the same load test on this version, the CloudWatch Insights query shows the performance gain:
If both of these functions were invoked 1 million times, these average duration metrics show a cost reduction of 23%:
|Requests||Memory (MB)||Duration (ms)||GBs||Request $||Duration $||Total|
Optimizing dependency usage for Node.js
const AWS = require('aws-sdk') AWS.config.region = process.env.AWS_REGION const docClient = new AWS.DynamoDB.DocumentClient()
You can reduce the memory usage and initialization time by only requiring the services used in your function. To do this, update the require statement to specify the service, and then pass any configuration options to the service constructor, as shown in version 3 of the example:
const DynamoDB = require('aws-sdk/clients/dynamodb') const documentClient = new DynamoDB.DocumentClient()
The load test on this version shows the following results in the CloudWatch Insights query:
Compared to the original version, this reduces the duration cost by around 75% and the overall cost by 26%:
|Requests||Memory (MB)||Duration (ms)||GBs||Request $||Duration $||Total|
Other optimizations for sub-100 millisecond functions
There are several other factors to consider that can reduce the overall duration of a function. Your use-case will determine which ones you can apply.
First, your choice of runtime makes a difference. Although the Lambda service is agnostic to runtime, some runtimes are faster to initialize than others. For short functions, consider using runtimes like Node.js, Go and Python, which can typically run more quickly.
Keep your deployment package sizes as small as possible since larger packages often take longer to initialize. You should remove any requirements or library imports that are not used directly by the function. Also, avoid “monolithic” functions that contain multiple code paths, and minify your code wherever possible to help get the deployment package size as small as it can be.
Ensure that any global objects that require initialization, such as database connections, are defined in the global scope. This is the code outside of the Lambda handler function, also known as “INIT” code. The cost of any lengthy initialization code is then amortized over multiple invocations if the same execution environment is invoked multiple times.
With the new 1 ms billing for Lambda functions, optimizing code duration has a direct impact on the cost of running Lambda functions. Shorter-lived functions can often run much faster with only a few small changes, and this translates to lowering cost.
Memory allocation in Lambda determines the CPU power of your function. To help determine the optimal trade-off between duration, cost and memory, use the AWS Power Tuning Tool. The ‘Reuse HTTP connection’ flag can help reduce latency in Node.js functions and you can enable this with an environment variable. Selectively choosing parts of the AWS SDK to import can also reduce duration without impacting the functionality of your business logic.
For more tips and tricks to help you get the most from your Lambda-based applications, visit Serverless Land.