We are going to replicate a famous experiment performed by Sir Ronald Fisher in his 1936 paper *The use of multiple measurements in taxonomic problems*. Why are we repeating a nearly 100-year-old experiment? Because now we can let a machine learn how to do it instead of doing the math ourselves. Yeah!
In this lab, you will see how to load data using TensorFlow’s dataset API, visualize it using Pandas, and then train a machine learning model using Keras. You will be using Python in a Jupyter notebook, but all of the code is provided. No experience with any of those is required, but some familiarity with programming will help you get more out of this lab.
Learning Objectives
Successfully complete this lab by achieving the following learning objectives:
- Load the Data
TensorFlow provides the training data you need in the TensorFlow Datasets API. Using the Datasets API, download the training data and split it into a NumPy array of features and a NumPy array of labels.
TensorFlow also provides the testing data, but it is not available through the API. Download this test data, load it, and split it into a NumPy array of features and a NumPy array of labels.
Pandas provides a convenient way to visualize the data (the next objective), so combine the features and labels in a Pandas DataFrame for the training data, and a DataFrame for the testing data.
- Visualize the Data
All of these steps will help you understand the data you are working with more fully:
Print counts of the number of samples in the training and test datasets.
View common statistics about the features in each dataset.
View the raw data from 15 examples.
Plot each feature against all other features to see if there are any natural groupings.
Plot the data to show how strongly each feature separates the data.
- Teach the Machine About Irises
Create your model and make predictions!
Create and compile a Keras model for classifying the Iris data. Include accuracy as a model metric for easy evaluation.
Fit the Keras model to the training data.
Evaluate the model’s accuracy on the test data set.
Show the model’s predictions for the test data.