Classifying Text Using TensorFlow

1.5 hours
  • 4 Learning Objectives

About this Hands-on Lab

In this lab, you will build a natural language processing model using TensorFlow that will classify text snippets into one of multiple categories. You will be performing the entire model creation process — from retrieving the data and formatting it properly, to designing a model architecture and training it to meet a desired metric score.

This lab is designed to be used as a practice exam to test your skills in preparation for the TensorFlow Developer Certificate, and thus, is a very challenging exercise.

Before beginning this lab, you should have PyCharm installed on your local computer. Additionally, you should have installed all packages required by the TensorFlow Developer Certificate exam.

Learning Objectives

Successfully complete this lab by achieving the following learning objectives:

Retrieve and Load the ag_news_subset Datasets
  1. Explore the ag_news_subset dataset in the TensorFlow Datasets Catalog.
  2. Retrieve and load the training and testing data.
Convert the Headlines to Token Sequences
  1. Review the problem description to determine the parameters you’ll need for the token sequences.
  2. Parse the article titles and labels from the datasets, converting them to the proper data types.
  3. Fit the Tokenizer to the training data.
  4. Convert the article titles to tokens.
  5. Convert the tokens to sequences.
Build and Train a Model to Classify the Headlines
  1. Review the model expectations to understand how your model should accept and output data.
  2. Create an appropriate neural network model using Keras.
  3. Compile your model with the correct loss function for the problem and label type.
  4. Train your model to reach the desired accuracy. Remember to capture the history!
  5. Save your model.
Evaluate Your Model with the Test Data
  1. Generate model statistics on the test data. Ensure you’ve met or exceeded the desired accuracy.
  2. Plot your model’s accuracy and loss for the training process.

Additional Resources

Scenario

You need to build a model to predict the category of a news article based on its headline.

We will be using the AG News Subset data, which can be loaded from the TensorFlow Datasets Catalog.

The data used to test the model will be the tokenized test data using the 7,000 most frequent tokens generated from the raw training data. The sequences will be padded or truncated to 24 tokens, which is the maximum training headline token length.

There are 4 classes in this dataset. The prediction output should be integer encoded. Your model should achieve at least 85% accuracy on the test data.

Lab Goals

  1. Retrieve and load the ag_news_subset datasets.
  2. Convert the headlines to token sequences.
  3. Build and train a model to classify the headlines.
  4. Evaluate your model with the test data.

As this is practice for the exam, you should attempt to solve the tasks on your own before checking the lab guide or the solution videos. Test your skills and see what areas you need to review.

Logging In To the Lab Environment

No environment is provided for this lab. This lab, which is a practice exam for the TensorFlow Developer Certificate, is meant to be completed in PyCharm running on your own hardware. It is important that you complete this lab on your own computer so you know how long different model architectures will take you to train, which will help you budget your time during the exam.

What are Hands-on Labs

Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.

Sign In
Welcome Back!

Psst…this one if you’ve been moved to ACG!

Get Started
Who’s going to be learning?