Skip to content

Contact sales

By filling out this form and clicking submit, you acknowledge our privacy policy.
  • Labs icon Lab
  • A Cloud Guru
Google Cloud Platform icon
Labs

Using Pandas Data Frames and Pivot Tables in Python

Pandas is a very popular library for performing data analysis with Python. Two common features that you'll use when working with pandas are data frames and pivot tables. In this hands-on lab, you'll go through the process of using data frames and pivot tables to analyze some employee information and provide some insight to others on your team. *Warning:* This is a lab designed as part of a professional level course and is difficult. The lab asks you to accomplish something using exact methods and functionality of the `pandas` library that might not have been covered in lessons. To feel comfortable completing this lab, you'll want to know how to do the following: * Use pandas data frames. Watch the "Creating and Using Dataframes" video from the [Using Python's Math, Science, and Engineering Libraries](https://linuxacademy.com/cp/modules/view/id/621) course. * Create pivot tables. Watch the "Creating Pivot Tables" video from the [Using Python's Math, Science, and Engineering Libraries](https://linuxacademy.com/cp/modules/view/id/621) course. * Comfortability reading the [pandas documentation](https://pandas.pydata.org/pandas-docs/stable/reference/index.html) to find new functions and methods to use to accomplish your goal.

Google Cloud Platform icon
Labs

Path Info

Level
Clock icon Advanced
Duration
Clock icon 45m
Published
Clock icon Aug 21, 2020

Contact sales

By filling out this form and clicking submit, you acknowledge our privacy policy.

Table of Contents

  1. Challenge

    Install Pandas

    Before we can use pandas to analyze our employee data we'll need to make sure that it is installed.

  2. Challenge

    Create a data frame from the `employee_info` list and determine the number of unique job titles.

    Our pay_analysis.py script already sets up a list of dictionary objects with our employee information named employee_info. We need to use this list to create a pandas.DataFrame for us to be able to perform our analysis. Get the column names from an item in the list and create a new data frame called employee_frame.

    Note: We also need to import pandas.

  3. Challenge

    Use a pivot table to determine the salary difference between people doing the same job depending on their gender.

    To determine the average pay difference between men and women with the same job title, we'll first need to determine which jobs have both men and women doing them. We can do this by grouping our data frame information by the job title first and then creating a pandas.Series that has the job titles as the index values and the number of unique genders as the values.

    Next, using job_title_unique_gender_count, we'll take all of the items that have a value of 2, drop the items that will be adjusted to a NaN (because they aren't a 2) and then get a pandas.Index object with just the names by accessing the index of our pandas.Series. This seems a little complicated, but it's a good way for us to get just a list of the job titles where the number of genders in that role is 2.

    After that, we create a new data frame from our employee_frame data that only contains employees in a job that includes both genders.

    The last two things we do are creating our pivot table based on the job title where we have a column for each gender and adding a column to the table to hold the pay difference between female and male employees.

  4. Challenge

    Use a pivot table to determine the salary differences of people doing the same job depending on their age.

    Our final pivot table will work a lot like the gender_difference_table, except we need to categorize each of our employees into an age range before we start making our comparisons. To do this, we'll use the pandas.cut function and add the resulting series as a new column on our employee_frame object, calling it age_range. There are 2 possible categories for each employee to fall under "18 - 40" and "41 - 80". We're going to also add a third, unused category to the list of categories for this column so that we can add a column to our pivot table later.

    The data type of the age_range column is category, and it is important that when we're adding a category to it that we use the inplace=True option. If we don't do this, then we'll get a new category returned to us and won't modify the employee_frame at all.

    Next, we need to determine which job titles have at least one employee from each of the age ranges. This code is virtually identical to what we did to determine while job titles had employees from each of the gender options.

    Now that we have a new data frame, we're ready to create our pivot table and populate a new 'difference' column. This will look very similar to our other pivot table, except we want to create columns for each of the values in our age_range column. Here's the rest of the code that we need to write to create our table and calculate the differences in pay:

The Cloud Content team comprises subject matter experts hyper focused on services offered by the leading cloud vendors (AWS, GCP, and Azure), as well as cloud-related technologies such as Linux and DevOps. The team is thrilled to share their knowledge to help you build modern tech solutions from the ground up, secure and optimize your environments, and so much more!

What's a lab?

Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.

Provided environment for hands-on practice

We will provide the credentials and environment necessary for you to practice right within your browser.

Guided walkthrough

Follow along with the author’s guided walkthrough and build something new in your provided environment!

Did you know?

On average, you retain 75% more of your learning if you get time for practice.

Start learning by doing today

View Plans