Automatically Processing Data in S3 Using Lambda

45 minutes
  • 4 Learning Objectives

About this Hands-on Lab

Automating processes is a cornerstone of cloud computing. If you have data processing that needs to happen on data once it is uploaded to the cloud, there is no reason not to automate it to save yourself extra work for every new file. With AWS, this automation is easy to achieve. In this lab, we’ll transcribe audio data into text whenever an audio file is uploaded to S3. Magic!

Learning Objectives

Successfully complete this lab by achieving the following learning objectives:

Create an IAM Role
  1. Log in to the AWS Console.
  2. Create an IAM role that will allow Lambda to:
    • Read and write to S3
    • Start jobs in Transcribe
    • Create logs in CloudWatch
Create a Lambda Function
  1. Create a new Lambda function that will handle events from S3. Be sure to use the IAM role you just created.
  2. Each event should create a new job in Transcribe that will translate the audio to text and store the result back in the S3 bucket that triggered the event.
    • Hint: Transcribe jobs must be uniquely named.
  3. Output the name of the Transcribe job to CloudWatch.
Create an S3 Bucket
  1. Create a new S3 bucket to hold the meeting audio.
    • Since the audio will contain secret company information, ensure the bucket can’t be accessed by anyone outside the company.
    • The meeting audio should be encrypted at rest to improve security.
  2. Create an event notification for uploaded audio files that will trigger your Lambda function.
    • Hint: It’s very important to only send audio data to your function since we’ll be automatically adding text data back to this bucket via the Transcribe job. You don’t want an infinite loop of Lambda calls.
Automatically Transcribe Data
  1. Upload the ImportantBusiness.mp3 audio file provided with the lab (direct link) to the S3 bucket.
  2. View the logs for the Lambda function in CloudWatch.
  3. View the details of the job created in Transcribe.
  4. Attempt to view the transcript in a browser via the S3 URL.
  5. Download the transcript to view the results.

Additional Resources

Scenario

Our company has begun recording meetings to improve transparency throughout the organization. These are currently stored as MP3 audio files, which will allow people to go back and listen to the meetings to confirm project goals, agreements made, and design specifications. However, with so many meetings, we generate a lot of hours of audio every day. It would take a lot of manual effort to find specific clips. Instead, having the transcripts of the meeting would make searching for important information a lot easier.

You have been tasked with creating a pipeline that will store the data in the cloud. A transcript of the meeting should be automatically generated and stored with the audio file for future reference.

Code and data for this lab can be found on GitHub.

Lab Goals

  1. Create an S3 Bucket
  2. Create an IAM Role
  3. Create a Lambda Function
  4. Create an Upload Trigger
  5. Automatically Transcribe Data

Logging in to the Lab Environment

To avoid issues with the lab, use a new Incognito or Private browser window to log in to the lab. This ensures that your personal account credentials, which may be active in your main window, are not used for the lab.

Log in to the AWS console using the account credentials provided with the lab. Please make sure you are in the us-east-1 (N. Virginia) region when in the AWS console.

What are Hands-on Labs

Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.

Sign In
Welcome Back!

Psst…this one if you’ve been moved to ACG!

Get Started
Who’s going to be learning?