Cleanse Outlying Data Using the pandas Python Package

30 minutes
  • 4 Learning Objectives

About this Hands-on Lab

In this lab, we will load a CSV file into a pandas DataFrame. Once loaded, we will remove rows with an `age` more than 3 standard deviations from the mean and rows with `hours-per-week` below the 10% and above the 90% quantiles. We will then write the cleansed data to a file.

Basic Python programming skills will be required for this lab. If you need a refresher, check out the following course:
– [Certified Associate in Python Programming Certification](https://acloud.guru/overview/8169e8e7-91a7-4d92-b278-4dd08c787dc6)

Learning Objectives

Successfully complete this lab by achieving the following learning objectives:

Load the Data File

Load the data.csv file into a pandas DataFrame.

Resolve Outlying age Values

Remove rows with an age more than 3 standard deviations from the mean.

Resolve Outlying hours-per-week Values

Remove rows with hours-per-week below the 10% and above the 90% quantiles.

Write the Data to a New File

Write the data to a new file named cleaned_data.csv.

Additional Resources

The Scenario

You are working as a database admin, and while attempting to load a CSV data file, you discover it has several outlying values in the age and hours-per-week columns. These rows will need to be removed. Thankfully, you have learned some methods to find and remove outlying data from the awesome courses on acloud.guru!

You will take the following steps to clean up the missing values:

  • Load the CSV file into a DataFrame.
  • Resolve outlying age values.
  • Resolve outlying hours-per-week values.
  • Write the data to a new file.

Log in to the server over SSH using the credentials provided.

The data.csv file is already available in the lab instance, but if you'd like to follow along on another machine, you may download it from here.

This data was sourced from the Center for Machine Learning and Intelligent Systems. Learn more here.

What are Hands-on Labs

Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.

Sign In
Welcome Back!

Psst…this one if you’ve been moved to ACG!

Get Started
Who’s going to be learning?