Profile Data Using the pandas_profiling Python Package

30 minutes
  • 4 Learning Objectives

About this Hands-on Lab

In this lab, we will load a CSV file into a pandas DataFrame. Once loaded, we will use the `pandas_profiling` package to generate a profile report on the data. We will then view this report using a web browser.

Basic Python programming skills will be required for this lab. If you need a refresher, check out the following course:
– [Certified Associate in Python Programming Certification](https://acloud.guru/overview/8169e8e7-91a7-4d92-b278-4dd08c787dc6)

Learning Objectives

Successfully complete this lab by achieving the following learning objectives:

Install pandas_profiling

Install the pandas_profiling package using pip. Be sure to use Python version 3.

Load the CSV File into a DataFrame

Load the provided CSV file into a pandas DataFrame object.

Generate the Report and Save to a File

Generate the profile report, and save it to a file named profile_report.html

Start a Simple Web Server on Port 8080

Lastly, start a simple web server using the HTTP module listening on port 8080 to make the report accessible.

You can access the report at: http://PUBLIC_IP:8080/profile_report.html. Replace PUBLIC_IP with the public IP address of the lab instance.

Additional Resources

The Scenario

You are working as a database admin, and after previously getting involved in a coworker's dispute, the validity of a data file containing U.S. Census data is being questioned. You certainly are not a Census expert, but you have recently learned about the pandas_profiling package from the awesome courses on acloud.guru. The information provided in a profile report may be just what is needed to resolve the questions. Since the report is available in an HTML format, you can start a quick HTTP server on your workstation to provide access to the report. You quickly return to your workstation to complete the following objectives:

  • Install pandas_profiling.
  • Load the CSV file into a DataFrame.
  • Generate the report and save to a file.
  • Start a simple web server on port 8080.

After you complete the above objectives, you can send your coworkers the following link to access the report: http://PUBLIC_IP:8080/profile_report.html.


Log in to the server over SSH using the credentials provided.

The data.csv file is already available in your workstation (the lab instance), but if you'd like to follow along on another machine, you can download it from here.

The data was sourced from the Center for Machine Learning and Intelligent Systems. Learn more here.

What are Hands-on Labs

Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.

Sign In
Welcome Back!

Psst…this one if you’ve been moved to ACG!

Get Started
Who’s going to be learning?