I spent the weekend fighting functions while trying to move a Flask application hosted on Elastic Beanstalk to AWS Lambda
After the elation of attending A Cloud Guru’s Serverlessconf in NYC, I was inspired to spend a weekend trying to implement what I learned from the workshops and speaker sessions. My goal was to move my simple web app that’s orchestrated in AWS Elastic Beanstalk to a serverless implementation in AWS Lambda. Let the weekend begin!
My simple web site for recipes is a Flask application that queries a MySQL database for recipes and allows users to connect using their Twitter accounts to create a shopping list.
This website is barely trafficked at all, so paying for the minimal required resources really irks me. I could be blowing that $10 a month on coffee. Right now, it has a domain name hosted with Route 53 and a load balancer that points to a single-instance deployment on Elastic Beanstalk. In the serverless world, this would cost (next to) nothing to host.
Let’s talk goals: my aim here is to lift-and-shift as much as possible. I know we can easily rewrite this in Node with DynamoDB, but we’re sticking with the Python/Flask combo since this tiny app is a proxy for bigger migrations. Flask has served us well and it’s so brainlessly simple to add routes and debug that we’d like it to survive this transition. It’s Saturday morning and I’m caffeinated to do this thing!
Zappa to the rescue … I think
It turns out there are a couple of frameworks designed to make it ‘easy’ to move Flask to Lambda — Zappa and Chalice — so we might be finished by lunchtime. After a couple of hours, it’s pretty clear that Chalice is a clone of Zappa that adds some automatic IAM magic but offers a tiny subset of its functionality and requires significant code changes to my app.
I ditch Chalice and focus on Zappa.
The Hello World example runs so well that my hopes are high — but the moment I attempt to deploy my app, I get the first error of the day. And not a friendly Python/Flask error, but one of those page-long AWS errors that makes me question my career choices.
By lunchtime, I’ve figured out that Zappa has a problem with reading files on Windows, so I hack a UTF-8 default directly into the offending Zappa python file to keep on moving. I love the smell of technical debt in the morning.
Gah, more errors:
- Zappa requires an insanely complex series of IAM permissions which nobody seems have to documented in one place. There are fragments here and there but I ended up building an IAM policy from hell to make this work. I avoided the obvious “Full Administrator Access” approach in an attempt to learn something and do things the right way. My bad, clearly.
- Lambda doesn’t seem to like the mysqlclient Python library. All the AWS examples I can find use pymysql so I spend a few minutes switching them out. Lunchtime is over and I’m staring down the barrel of a long Lambda afternoon.
- I run
zappa deploy devand wait two minutes. It uploads, I see green in the console instead of red and get an API endpoint. Success! I cut and paste the URL in my browser and wait, and wait… and it times out. How fleeting my victory. This doesn’t seem very significant, but marks the beginning of a goose chase of the wild variety that sinks the rest of the day.
VPCs, Lambda, RDS and OMG
Using RDS in Lambda is something of an anti-pattern since you’re pushing the scaling bottleneck down the stack to the database. But the reality is that many, if not most, web apps use SQL databases — so I resist the urge to shift to DynamoDB and continue to figure this out. For little-used hobby sites like mine, the RDS option is still valid.
The nut of the problem is that RDS is a resource that uses a VPC. In order for your Lambda function to see the database, it too also live in a VPC. When you open up the network section of your Lambda function, this is where you can set this up.
At this point, if you haven’t completed any form of AWS certification, you probably should stop and get some help. Fortunately, I’m certified — so can comfortably confuse myself without any assistance!
After configuring the necessary security group settings, I refresh the URL and the page loads. But sadly all the menu links fail with a menacing Forbidden error. Some quick Googling reveals that the API endpoint created in Zappa includes the stage name, which breaks all the routes in the Flask app.
Focusing on the positive, at least the home page works.
Fortunately, this issue can be resolved by using a custom domain name. I use Route 53 and Certificate Manager to quickly setup the domain, and take the usual 30 minute break while CloudFront does whatever it does. This is my late lunch-slash-dinner that gives me a few minutes to clear my mind before entering round two.
After the deployment completes, the app now loads all the routes successfully and it’s looking like I’m almost done. Almost.
I discover that my Twitter OAuth is timing out when clicked, and that’s a shame since it’s a critical part of the app. Well, there’s nothing wrong in the code — but after delving into cryptic error messages and setting log messages everywhere, it looks like my app can no longer reach Twitter’s API.
Frustratingly, once a Lambda function is in the VPC needed for RDS — it doesn’t have Internet access. Hark, the main documentation for this minor but important point is in light grey text in the Network section of the Lambda function:
Now I need a NAT Gateway at $0.045 per hour or $32 a month which isn’t meeting my expected cost of zero. This all seems wrong. I creep off to bed and see if I can work my way around this tomorrow.
Sunday morning: Back on the horse
It’s 7am, 55 degrees, and the birds are singing outside — but my screen has stayed fixed on this timeout error like I’ve never left. I need a few seconds to regroup and remind myself what we’re doing here.
I have a Flask app that talks to RDS and uses a simple Twitter OAuth to get an ID from a visitor — and we want to move this all to Lambda. That’s it. To get around this VPC issue, other than paying for a NAT gateway, I see a few options ahead:
- Ditch my Flask OAuth library and the dozen lines of code that do the magic, and move to AWS Cognito. Since this is a managed service internal to Amazon, I would not need Internet access for the Lambda function.
- Ditch RDS and move the data to DynamoDB. This would eliminate the need for the VPC completely so I can keep my existing OAuth solution.
- Separate the database logic out to its own Lambda function (inside a VPC) and the rest of the app to a function outside a VPC. This seems promising.
The app only has a handful of very simple database queries that fetch recipes, work out which ones have been selected by the user, and calculate the shopping list. Within an hour, I’ve built a proxy Lambda function that will return these result sets, packaged the zip and libraries manually (urgh) and it’s all looking good.
I create a test harness function with no VPC that calls this database function and it’s returning Python-ready dictionaries of rows of data. It’s looking good! I return to my main application and update the route that lists all the recipes — having it call the Lambda function instead of calling the database:
All I want at this point is to see that the same query results are returned from the Lambda function. I do a
zappa update dev and wait…
Behold! It doesn’t work!
Playing with this, it looks like Zappa starts to wobble when you call Lambda functions through boto3. I’m not sure why — but only when I remove the boto3 references to Lambda does the deploy operation work again. I try debugging for an hour and get absolutely nowhere.
The more I think this through, it definitely feels kluge-y having Python call a SQL-proxy Lambda this way. I cut my losses and look at the next option — Cognito!
Ergo cognito dumb
I’m excited about this option. Cognito answers your prayers for the need for user management, offering user pools, Federated Identity management and all manner of goodness. In principle, I can dump OAuth, integrate with Cognito and the need for a NAT Gateway goes away.
Except — not in this case — as I discover over the next couple of hours.
I have so many tabs open in Chrome right now, I can barely see the titles. I haven’t used Cognito before so I’m surprised to find …
- The documentation is weak, with no comprehensive examples specific to web apps.
- It’s too big a solution for basic authentication. In the case of this app, we only need a Twitter ID to track the recipes the user selects — we don’t need IAM permissions, user databases or anything remotely complex.
For a Node or JS app written from scratch for serverless, Cognito would undoubtedly be the solution but for my Python-heavy Flask app, it’s not clear how this is going to work at all.
Googling around for Python Cognito examples, it’s quickly obvious that I’m not the first person to ask this question.
Biting bullets and moving away from RDS
If I eliminate RDS, I solve a couple of problems. One is the scaling bottleneck issue of traditional databases, and the second is the need for the VPC, which solves my Internet access problem for OAuth.
But the difficulty is in porting across tables into DynamoDB. The recipe table breaks down the ingredients in a separate table, and there’s a third table tracking which users have selected recipes.
It goes from a handful of tables in the relational world to two NoSQL tables — adding some aggregation logic to the backend. On the surface, it’s a minor migration, but it does result in rewriting how the main application queries and handles the data.
It’s a little deflating — but this was how the weekend ended. I covered a fair amount of ground, but didn’t succeed in what I hoped would be an easy task.
Rushing to conclusions
Much of this is a work in progress, and I may well be wrong about a number of things in this article. I’m still feeling around in the dark and throwing out my conclusions here in good faith — hoping they spur some conversation.
Flask doesn’t belong
Writing Flask applications is very easy, and I can do some fairly complex things quickly in this environment. But porting a simple Flask app to Lambda seems like the wrong approach and Flask doesn’t really belong there. Even if this had worked, Zappa ends up creating a monolithic function that flies in the face of serverless architecture.
This is contrary to my experience with Elastic Beanstalk, which makes it incredibly easy to deploy web apps, and works hand-in-glove with WSGI-based frameworks like Flask, Bottle and Pyramid. But in moving to serverless, I think we need to use what serverless provides, and not retrofit something beyond its intended use.
Documentation is sparse
As with so many AWS services, the documentation is sparse. In the past, I’ve battled through S3 CORS issues, CloudFront distribution configurations and undocumented Elastic Beanstalk features to eventually understand and confidently solve issues. But I drew a blank this weekend in this exercise and still have many unanswered questions.
Need more patterns
In coding, I’ve always found I learn by tinkering with solid examples but it seems hard to find anything concrete here. I’ve no doubt about the robustness of these services but Amazon could definitely throw us a bone on tutorials, examples and best practices.
The TL;DR version of lifting and shifting Python web apps to serverless:
- The combination of services used by your application (RDS + Flask + OAuth in my case) may not work in serverless as easily as expected.
- A lift-and-shift might require enough modification to warrant a rewrite, even for apparently simple apps.
- The documentation (specifically the lack of) can be a major time-suck in chasing down arcane errors.
While I’m continuing to focus on serverless as our prime delivery mechanism. I’d be cautious to consider any migration as trivial. Starting over might actually be easier — but that’s my project for another weekend.
I’d be interested in hearing your thoughts in the comments below!