In this lab, you will build a natural language processing model using TensorFlow that will classify text snippets into one of multiple categories. You will be performing the entire model creation process — from retrieving the data and formatting it properly, to designing a model architecture and training it to meet a desired metric score.
This lab is designed to be used as a practice exam to test your skills in preparation for the TensorFlow Developer Certificate, and thus, is a very challenging exercise.
Before beginning this lab, you should have PyCharm installed on your local computer. Additionally, you should have installed all packages required by the TensorFlow Developer Certificate exam.
Learning Objectives
Successfully complete this lab by achieving the following learning objectives:
- Retrieve and Load the ag_news_subset Datasets
- Explore the ag_news_subset dataset in the TensorFlow Datasets Catalog.
- Retrieve and load the training and testing data.
- Convert the Headlines to Token Sequences
- Review the problem description to determine the parameters you’ll need for the token sequences.
- Parse the article titles and labels from the datasets, converting them to the proper data types.
- Fit the
Tokenizer
to the training data. - Convert the article titles to tokens.
- Convert the tokens to sequences.
- Build and Train a Model to Classify the Headlines
- Review the model expectations to understand how your model should accept and output data.
- Create an appropriate neural network model using Keras.
- Compile your model with the correct loss function for the problem and label type.
- Train your model to reach the desired accuracy. Remember to capture the history!
- Save your model.
- Evaluate Your Model with the Test Data
- Generate model statistics on the test data. Ensure you’ve met or exceeded the desired accuracy.
- Plot your model’s accuracy and loss for the training process.