Use pandas DataFrames on Excel Data

30 minutes
  • 4 Learning Objectives

About this Hands-on Lab

In this lab, we take a .csv file and create an Excel workbook out of it using pandas.

The pdf of the notebook for this lab is [here.](https://github.com/linuxacademy/content-python-for-database-and-reporting/blob/master/pdf/hol_4_2_l_solution.pdf)

Learning Objectives

Successfully complete this lab by achieving the following learning objectives:

Start Jupyter Notebook Server and Access on Your Local Machine

Connecting to the Jupyter Notebook Server

Make sure that you have activated the virtual environment!

  1. Use the following to activate the virtual environment:
conda activate base
  1. To start the server, run the following:
python get_notebook_token.py

This is a simple script that starts the Jupyter notebook server and sets it to continue to run outside of the terminal.

On the terminal is a token, please copy this and save it to a text file on your local machine.

On Your Local Machine

  1. In a terminal window, enter the following:
ssh -N -L localhost:8087:localhost:8086 cloud_user@<the public IP address of the Playground server>

It will ask you for your password; this is the password you use to login to the Playground remote server.

Leave this terminal open, it will appear nothing has happened, but it must remain open while you use the Jupyter Notebook server in this session.

  1. In the browser of your choice, enter the following address:

http://localhost:8087

This will open a Jupyter Notebook site that asks for the token you copied from the remote server.

Read the File Into a DataFrame
# open file for reading
f = open('dow_jones_index.data')

# print the first two lines
print(f.readline())
print(f.readline())

f.close()

It appears the file is CSV. Read the file into a dataframe.

import pandas as pd

stock_df = pd.read_csv('dow_jones_index.data')

stock_df.head()
Create the Excel Workbook

Create a dataframe for each of the requested stocks

ge_df = stock_df[stock_df.stock=='GE']
ibm_df = stock_df[stock_df.stock=='IBM']
krft_df = stock_df[stock_df.stock=='KRFT']

Write the Excel file

with pd.ExcelWriter('stocks.xlsx') as writer:  
    ge_df.to_excel(writer, sheet_name='GE')
    ibm_df.to_excel(writer, sheet_name='IBM')
    krft_df.to_excel(writer, sheet_name='KRFT')
Check the Excel Workbook Contains the Requested Data

Load the file into an ordered dict dataframe and then check that each worksheet is populated.

my_stock_df = pd.read_excel('stocks.xlsx', sheet_name=None)  

my_stock_df.keys()
my_stock_df['GE']
my_stock_df['IBM']
my_stock_df['KRFT']

Additional Resources

You are applying to be a junior-level data analyst for a Financial Planner's office. The senior-level data analyst has given you a file named dow_jones_index.data and a machine that has Anaconda installed. He asks that you produce an Excel file with worksheets for each of three stock ticker names: GE, IBM, and KRFT.

What are Hands-on Labs

Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.

Sign In
Welcome Back!

Psst…this one if you’ve been moved to ACG!

Get Started
Who’s going to be learning?