Generating Text Using TensorFlow

1 hour
  • 4 Learning Objectives

About this Hands-on Lab

One fun thing we can do with machine learning is generating text! We can build a model that will predict the next word in a sequence. This combines interesting aspects of both sequence processing and natural language processing. This lab will give you more practice in both of these important areas of machine learning.

### Prerequisites
This lab is designed to be completed in PyCharm running on your machine. You should have PyCharm and TensorFlow installed before attempting this lab. We will not be covering this setup in the lab.

Learning Objectives

Successfully complete this lab by achieving the following learning objectives:

Retrieve and Load the Tiny Frankenstein Data
  1. Download the Tiny Frankenstein dataset: https://storage.googleapis.com/acg-datasets/tiny_frankenstein.tgz.
  2. Load the raw text into the program.
  3. Convert the text to lowercase.
Turn the Text Data into Token Sequences
  1. Train a tokenizer on the Frankenstein text using every available word.
  2. Find the total number of learned words.
  3. Convert the Frankenstein text to tokens.
  4. Convert the tokens to sequences.
Build a Model to Predict Tokens
  1. Create a recurrent neural network to learn the token sequences.
    • Hint: Bidirectional LSTMs work really well for this task!
  2. Compile the model with an appropriate loss function and optimizer.
    • Hint: You might want to increase the learning rate of the optimizer. This model can take a very long time to converge with the default learning rate.
  3. Train your model for 10-20 epochs.
Generate Text!
  1. Create a reverse token lookup dictionary from the tokenizer’s word_index.
  2. Create tokenized text to start the predictions.
  3. Use the model to predict the next token.
  4. Append the predicted token to the text, truncating earlier values.
  5. Continue using the model to predict as many tokens as you’d like!

Additional Resources

Scenario

Let's generate text in the style of Mary Shelley's Frankenstein. The world needs more Gothic-style science fiction! Starting with the text from the Tiny Frankenstein dataset, which is the first few chapters, we'll teach a model to write us more of the same text.

Lab Goals

  1. Retrieve and load the Tiny Frankenstein data.
  2. Turn the text data into token sequences.
  3. Build a model to predict tokens.
  4. Generate text!

Logging In To the Lab Environment

No environment is provided for this lab. This lab is meant to be completed in PyCharm running on your own hardware in preparation for the TensorFlow Developer Certificate Exam. If you don't have PyCharm or are not working toward the certification, you can use the provided Google account credentials, or your own account, to complete the tasks in Google Colab, which provides free hosting and compute power for Jupyter Notebooks. If you use the provided credentials to access Colab, make sure to save a copy of your work locally before the end of the lab.

What are Hands-on Labs

Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.

Sign In
Welcome Back!

Psst…this one if you’ve been moved to ACG!

Get Started
Who’s going to be learning?