Data Preparation (Import and Cleaning) for Python

By David Thomas

This course explores the tools available in the Python language for data preparation.

6 hours
  • 37 Lessons
  • 8 Hands-On Labs

About the course

Python can be a powerful tool for data preparation. In this course, we will quickly cover how to connect to various database types. Then, we will jump into using the pandas Python package for data preparation. We will look at examples of cleansing missing and outlying data as well as data visualizations and exploration. In addition to the pandas package, we will also look at preprocessing data for machine learning using the scikit-learn Python package.

Before beginning this course, you should have a strong knowledge of Python and data approaches. Check out the Prerequisite and Related Courses lesson in the Introduction section for a starting point.

  • Chapter 1 3 Lessons Introduction 6:23

    Course Introduction

    1:19

    The Python Packages Used in This Course

    4:09

    Prerequisite and Related Courses

    0:55
  • Chapter 2 7 Lessons Database Access 1:59:31

    Section Introduction: The Python DB-API

    2:57

    Relational Databases

    9:46

    NoSQL (Non-Relational) Databases

    10:14

    Embedded Databases

    4:21

    Section Recap: Database Access

    2:13

    Using PostgreSQL with Python

    45:00 Hands-On Lab

    Using MongoDB with Python

    45:00 Hands-On Lab
  • Chapter 3 9 Lessons Data Visualization 1:58:33

    Section Introduction: Why Data Visualization?

    1:36

    What Kind of Data Can Python Read?

    4:02

    Reading and Writing Tabular Data

    3:22

    Calculating Summary Statistics

    6:42

    Data Profiling

    10:26

    Section Recap: Data Visualization

    2:25

    Converting CSV data to JSON using the Pandas Python Package

    30:00 Hands-On Lab

    Generating Summary Statistics using the Pandas Python Package

    30:00 Hands-On Lab

    Profiling Data using the pandas_profiling Python Package

    30:00 Hands-On Lab
  • Chapter 4 7 Lessons Data Cleansing 1:21:35

    Section Introduction: Making Your Data Sparkle and Shine

    1:34

    Missing and Invalid Data

    6:25

    Outlying Data

    6:42

    String Processing

    5:41

    Section Recap: Data Cleansing

    1:13

    Cleansing missing data using the Pandas Python Package

    30:00 Hands-On Lab

    Cleansing outlying data using the Pandas Python Package

    30:00 Hands-On Lab
  • Chapter 5 8 Lessons Preprocessing Data for Machine Learning 54:23

    Section Introduction: What Is the sklearn.preprocessing Package?

    2:13

    Standardizing Your Dataset

    4:11

    Non-Linear Transformation

    4:58

    Normalization and Discretization

    4:07

    Categorical Features

    3:20

    Polynomial Features and Custom Transformers

    3:55

    Section Recap: Preprocessing Data for Machine Learning

    1:39

    Pre-processing Data with the scikit-learn Python Package

    30:00 Hands-On Lab
  • Chapter 6 3 Lessons Conclusion 6:45

    Python Pitfalls

    5:11

    Course Summary

    1:06

    Conclusion and What’s Next

    0:28

What are Hands-on Labs

What's the difference between theoretical knowledge and real skills? Practical real-world experience. That's where Hands-on Labs come in! Hands-on Labs are guided, interactive experiences that help you learn and practice real-world scenarios in real cloud environments. Hands-on Labs are seamlessly integrated in courses, so you can learn by doing.

Get Started
Who’s going to be learning?
Sign In
Welcome Back!

Psst…this one if you’ve been moved to ACG!