A serverless queue can decouple your reply from the initial request to prevent timeouts caused by constrained downstream systems
In a recent post, we explored various ways to implement the
push-pull pattern. Now it’s time to examine another useful pattern — the
Decoupled Invocation is one of an extensive set of patterns that Arnon Rotem-Gal-Oz provided in his book SOA Patterns. The entire pattern could be summed up as — “use queue for request and response.”
In essence, decouple the reply from the initial HTTP request.
Decoupled Invocation pattern is extremely useful when:
- we are using an API that is more scalable than its downstream systems — such as a stateless API backed by AWS Lambda
- we are using an API that needs to perform expensive, time-consuming processing in order to respond to the request
In both of these common use cases, it’s very likely that your API would timeout or err as a result of the delayed response.
In situations where an API is dependent on downstream systems that don’t proportionally scale, the inbound requests will fail as the constrained downstream systems are unable to keep pace with the higher throughput.
We have received your request — we’ll get back to you ASAP
To decouple the request from the response, the
Decoupled Invocation pattern sends the API request to a separate worker task. In response, the API would return a 202 ACCEPTED to the client — along with the
location of the worker task results.
While the request is being processed, the client would periodically poll the API
location for the result. In the meantime, the API would continue to return 202 ACCEPTED to the client. Once the task is process and the status changes, the API would return a 200 OK and the results in the response.
By removing the urgency to reply to the caller right away, it allows us to process the tasks at a steady rate that will not cause problems downstreams.
Decoupled Invocation amortizes spikes in traffic.
Removing the urgency to respond also enables more flexible retry strategies when we encounter temporal issues with the downstream systems.
Additionally — because we’re able to quickly respond to the initial request with an acknowledgement — this approach allows the client to be smarter about how to communicate the status of the task to the user.
For example, you can return an status indicator similar to what is used by WhatsApp — now a standard practice with most messaging apps.
This approach is a common method for informing users when their messages are delivered or read — both of which are asynchronous processes.
Stick a queue in there
Upon receiving the original request, the API can store a record for the request in a DynamoDB table — along with a
created_at timestamp. The API would also queue up a task in either SQS or Kinesis/DynamoDB Streams.
I have purposely omitted SNS here — I think it’s a poor choice in this situation. The invocation-per-message policy of SNS would not allow us to amortize any spikes in traffic.
Simple Queue Services
SQS is finally a supported event sources for AWS Lambda! You start with 5 pollers (which are managed by AWS), and your function would receive batches of up to 10 messages at a time. This means you will start off with up to 5 concurrent executions of your processor function.
As throughput increases, AWS would automatically increase the number of pollers, hence increasing the number of concurrent executions of your function. This maxes out at 1000, or the function concurrency limit, whichever is lower.
Check out my previous post on the pub-sub and push-pull messaging patterns for a more in-depth discussion about the differences between SNS, SQS and Kinesis/DynamoDB Streams.
To ensure that the client doesn’t poll indefinitely, the
created_at timestamp can be used to timeout the request. The length of the timeout poll can be adjusted to align with an SLA, or expectations on the maximum allowable wait time for such a request.
Alternatively, you can use either Kinesis or DynamoDB Streams. Both are good choices in my opinion — although both have their own caveats.
By using DynamoDB Streams, you can simplify the API by only writing to the DynamoDB table — and rely on DynamoDB Streams to trigger the background worker.
With DynamoDB as your queue, simply include the original request in the row and make it available to the worker — just make sure the worker ignores row updates after it saves the result.
If you’re dealing with a really long running process that have several distinct subtasks, then you can get even fancier. Instead of a binary state of
not done — you can report back with % completed. As the worker completes each subtask, it can increment the % so the user receives more frequent feedback on the progress of their task.
TL;DR — Consider adopting the
Decoupled Invocation pattern when performing expensive and time-consuming tasks in response to a HTTP request, or if your API layer is constrained by downstream dependencies.
Thanks for reading! If you like what you read, hold the clap button below so that others may find this. You can follow me on Twitter.