To understand Lambda, let me tell you a story that made me understand and appreciate Lambda the hard and expensive way. At the beginning of my AWS cloud journey, I was building a simple website to collect and process specific data. The ultimate plan was to build a serverless application that acts as a recommendation engine using the collected data. The website itself didn’t have any dynamic features as I relied on external tools such as old plain google forms and CSV files to collect the data. The existing platform was working fine, but I wanted to try my hand at migrating the website and data to AWS.
“It will speed up the process of building the serverless application down the road.” – my brain reasoned. But if I’m being honest, I mainly wanted an excuse to tinker with AWS services.
I was excited to get started, and jumped into drawing a crude diagram of the architecture, without studying the available offerings. Once the diagram was done, I went ahead and migrated the website to EC2. Talk about under planning and overkilling a project.
Then life happened, and I decided to put the project on hold, so I diligently went ahead and took down all the assets and resources from AWS S3 while leaving nothing behind but a static “Coming Soon” page.
My role as a software engineer wired me to think of projects in terms of code, functionality, and product releases, hence why I took the features down. However, my methodology at the time didn’t account for the behind-the-scenes infrastructure. Or the fact that the needs of the website had drastically changed. So I left the EC2 instance type and configuration as-is.
EC2 is for the strong
The fact that I can’t recall how I configured that particular instance, I can’t remember whether it was a t2.micro or a t2.small instance indicates that I shouldn’t have been tinkering with EC2.
However, I remember too well my confusion and surprise when I saw a rather hefty recurring AWS expense listed on my bank statements.
My confusion stemmed from the fact that the website didn’t have any active visitors. Moreover, my S3 bucket was practically empty. I’d hosted more complicated websites for much cheaper costs so what was causing the recurring charges?
If you’re a cloud guru, you probably already know that with EC2 comes a lot of flexibility, and with flexibility comes more responsibility. Developers that use EC2 instances to host applications need to have a solid understanding of various technical components such as:
- How to pick the right instance type and instance purchasing option for the project
- How to scale EC2 and how to balance the workload on instances
- How to work with Target Groups and Application Load Balancers
- When to shut down instances to avoid incurring unnecessary charges
Needless to say that I didn’t have the last one in my AWS Cloud toolkit at the time. Once I connected the dots, I realized I was using the wrong service altogether.
Luckily, all it took was a quick search to realize that using Lambda is a much better offering for my particular needs. Firstly, I was after the ‘doodle using pencil’ version of serverless applications vs. the ‘watercolor a canvas’ version. Secondly, I was more than happy to lose the fine grained control and hand over the reins of infrastructure to AWS. I also found out that I can easily integrate Lambda with S3, AWS Amplify, Amazon API Gateway and DynamoDB to build the entire application.
What is AWS Lambda?
- Lambda is a cloud computing service that runs an event-driven architecture, meaning that external triggers invoke the code to get it to run.
- It offers a pay-as-you-go model. For the scope of my experimental project, being charged per request and duration made more sense than having to pay for every second my EC2 instance was running.
- Lambda also offers comprehensive yet abstract control of the underlying resources whereby AWS manages everything pertaining to the underlying infrastructure (e.g., scaling, running security patches, OS configurations, etc.).
- Lambda is a Function as a Service (FaaS)—a category of cloud computing that enables you to focus solely on building, running, and testing responsive functions (units of code) to build serverless applications. In addition to abstracting the provisioning of the underlying resources, AWS takes care of auto-scaling the application to match the demand.
This abstraction might hinder you from certain customizations. But in my case, I didn’t need a customized ecosystem, nor did I want to deal with getting down to the granular details of the cluster.
Configuring the memory and the execution timeout in Lambda was more than enough for my project.
For the application’s backend, I wrote and hosted robust code to process the data using AWS Lambda, setup S3 to store the data and uploaded the necessary files, and monitored the logs through CloudWatch that Lambda automatically sent to CloudWatch.
So three services were all I needed to build, host, test, and monitor the backend of a serverless project.
For a comprehensive tutorial, check out this hands on tutorial for building a serverless web application that uses AWS Lambda and DynamoDB for the backend, AWS Amplify to host the static website, and Amazon API Gateway to handle users’ requests.
Not to be overly dramatic, but I was having a Spanish dancer emoji moment reading the Lambda documentation, especially having been through the pain of wrapping my head around everything to do with hosting a simple application on AWS.
But the skeptic in me couldn’t quite enjoy the moment, so I started pondering about the catch. I wondered whether Lambda would only support an exclusive set of languages or if it has a steep learning curve. Or perhaps it is drastically slower than EC2.
Is AWS Lambda easy to learn?
Getting started with Lambda is easy and straightforward. However, there is a bit of a learning curve when it comes to writing code that is well suited for Lambda’s execution environment. For getting started, you can try this 10-minute “Hello, World!” 10-minute tutorial tutorial and judge for yourself.
Lambda’s simplicity is thanks to its key features that enable you to write, test and execute code on the fly. Some of these features include:
You can easily and quickly get acquainted with Lambda because the Lambda console provides an IDE-like environment similar to other popular IDE applications. The GUI has a distinct section for authoring and testing the code, a tab for configuring the application, and a console to print out the results.
Function Blueprints and Pre-Built Applications
The Lambda console offers the following three ways by which you can create new functions:
- Author from scratch – as the name suggests, with this option, you get to create functions from scratch
- Blueprints – AWS offers Blueprints as ready-to-use templates pre-built with the necessary configurations. You can also use blueprints to see how to integrate Lambda with other AWS services and 3rd party libraries.
- AWS Serverless Application Repository – this option allows you to browse as well as share pre-built applications. You can either publicly share applications or privately share them with specific AWS accounts.
Lambda natively supports a variety of popular languages including Java, Python, Node.js, C#, .NET and PowerShell. Lambda also provides a Runtime API to power any additional programming languages and to support custom runtimes.
Lambda layers enable you to import external libraries (in the form of a .zip file) and other dependencies into your function. Layers can be third-party libraries, additional code, configuration files or even custom runtimes. And AWS offers its own set of libraries that you can use immediately. For instance, they have the AWS SDK for Pandas Layer that you can use to import pandas into your functions.
By default, all Lambda functions export logs to AWS CloudWatch. Therefore, you can use AWS CloudWatch Lambda Insight metrics to monitor or troubleshoot your applications.
Is Lambda slower than EC2?
Lambda is slower than EC2, in the sense that it will not instantaneously respond to events. This behavior is inherent since the serverless model only runs upon demand. However, it is unlikely that this kind of delay will cause issues unless you’re dealing with a highly critical application.
Lambda functions are always available, but they’re not always running. Lambda only runs when an external event triggers it to do so. Think of it this way, you might be available all day, but you probably won’t check your slack DMs unless an external event (think notifications) reminds you to do so.
Once invoked, a Lambda function might take up to 100 milliseconds to execute the code for up to 15 minutes. This is a hard limit so if your function absolutely needs more than 15 mins to run, then consider EC2 or an alternative service.
A Lambda function might also withstand extra latency (over 1 second) when a new function instance needs to be created and initialized. “Cold start” refers to the time it takes to kick-start a new function instance. Depending on the type and urgency of the task, this may or may not be a problem. For more on this, check out our blog on How to Keep Your AWS Lambda Functions Warm. Also, check our summary on the Lambda SnapStart introduced in re: Invent 2022. Lambda SnapStart drastically cuts down cold start time for Java applications.
And now that we’re somewhat acquainted with AWS Lambda, let’s get back to our story, where I attempt to build a simple serverless application. Besides the fact that AWS Lambda met all my functional requirements as a developer, Lambda was well suited for my ultimate objective of collecting and processing data.
AWS Lambda and Data Processing
Data processing is simply the process of turning raw data into meaningful information. And if data is the new oil, data processing is the new refinery. And we can’t talk about data processing without diving into data lakes first. Organizations use data lakes and pipelines to store and refine the data.
A data lake is a centralized storage repository containing all the raw data in its native format. It extracts data from a wide array of data sources.
The data can be structured, semi-structured, or unstructured. The data serves different groups at different stages. Data scientists and engineers use non-curated data at its early stages for analytical purposes using tools such as Amazon Athena.
Afterward, the data goes through validating, cleaning, and other transformations to serve the business users and publish business reports.
Data processing is the umbrella term used to refer to the process of curating data. But this can be anything from converting and compressing files to validating, transforming, enriching, or filtering the data. And this is where AWS Lambda comes in. Lambda is well suited for preprocessing and processing data in data pipelines. In addition to processing data, Lambda can extract and ingest the data. For more on this, check out this project that uses Lambda to build an ETL (extract, transform, and load) pipeline. And for a more in-depth and complex example, watch Nextdoor’s 2017 AWS re: Invent talk to see how Nextdoor used AWS Lambda to streamline their ETL pipelines.
There are a few scenarios where you can use Lambda to process data:
- Transforming data through simple transformations, for example, substituting selected values with specific constants
- Converting and compressing files, for example converting a CSV file to a JSON file or vice versa
- Streamline the process of compressing images to enhance performance and optimize costs on the cloud.
So to sum it all up, do you want to focus more on your code which takes less than 15 minutes to run? Does the idea of maintaining servers keep you up at night? Do you’ve difficulty justifying paying money for idle time? If you answered yes to any of the previous questions, you should consider Lambda. And remember, you do not always need a project to tinker with AWS cloud computing offerings.
If you’re interested in learning more about Lambda’s building blocks, check out my course on Processing Serverless Data Using AWS Lambda. The course will teach you how to utilize AWS Lambda to serve your business needs as a data professional. You’ll unpack Lambda by learning how to use it to transform data and integrate it with other AWS services to build simple pipelines.