Forum Export: Exporting Data as JSON and CSV Using Python

45 minutes
  • 4 Learning Objectives

About this Hands-on Lab

Databases are great for storing and allowing us to query information, but the default shape and SQL are not always the best options for people within our organization. If we know the schema of our database tables, we can create classes to model our data and then massage the data into other shapes that are more helpful for specific use cases. In this hands-on lab, we’ll use an existing SQLAlchemy library to query our forum database and export data as both JSON and CSV for other employees in our organization.

Learning Objectives

Successfully complete this lab by achieving the following learning objectives:

Create a Virtualenv and Install the `forum` Package

We have the source for our forum package downloaded to ~/forum. We’re going to create a new Python 3.7 virtualenv and install the package locally. This project utilized Pipenv in development, and we can use that now to create the virtualenv and install the dependencies:

$ cd ~/forum
$ pipenv install
...

With the virtualenv in place we just need to activate it and then install the forum package itself:

$ pipenv shell
...
(forum) $ pip install -e .
...

Now we can use the forum library from within Python scripts and the REPL.

Write the Posts Query in the `export_csv.py` Script

The foundation of both of our export scripts will be the same, so let’s write that first. We need to do the following:

  • Create a SQLAlchemy engine to connect to our database server.
  • Create a Session class that is bound to our engine.
  • Create an instance of our Session class to make queries to the database.

This is all done using SQLAlchemy itself instead of relying on code in the forum package. We’re also going to pass the database connection in using a DB_URL environment variable when we run the script:

export_csv.py

import os

from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker

db_url = os.environ["DB_URL"]
engine = create_engine(db_url)
Session = sessionmaker(bind=engine)
session = Session()

With the SQLAlchemy boilerplate out of the way, we’re now ready to use our models. Because we need some information about the comments table but want to receive a row for each post, we’ll need to create a few different subqueries to calculate the various comment counts. Let’s write these subqueries now:

export_csv.py

# SQLAlchemy configuration omitted

from sqlalchemy.sql import func

from forum.models import Post, Comment

comments = (
    session.query(Comment.post_id, func.count("*").label("comments"))
    .group_by(Comment.post_id)
    .subquery()
)

negative_comments = (
    session.query(Comment.post_id, func.count("*").label("negative_comments"))
    .filter(Comment.sentiment == "negative")
    .group_by(Comment.post_id)
    .subquery()
)

positive_comments = (
    session.query(Comment.post_id, func.count("*").label("positive_comments"))
    .filter(Comment.sentiment == "positive")
    .group_by(Comment.post_id)
    .subquery()
)

We’ll use these three subqueries as part of the main query that we’ll be using to get the posts.

export_csv.py

# Previous code omitted

final_query = (
    session.query(
        Post,
        comments.c.comments,
        negative_comments.c.negative_comments,
        positive_comments.c.positive_comments,
    )
    .outerjoin(comments, Post.id == comments.c.post_id)
    .outerjoin(negative_comments, Post.id == negative_comments.c.post_id)
    .outerjoin(positive_comments, Post.id == positive_comments.c.post_id)
)

This query is a bit complicated, but we’re getting the information that we want out of the database, making only 4 queries regardless of how many posts and comments there are.

This code will act as the foundation for both of our scripts.

Add the CSV Export to `export_csv.py`

Now that we have our final query, we need to build a csv.DictWriter and iterate through the rows that are returned from our query. Let’s set up the writer first:

export_csv.py

# Other imports omitted
import csv

# Query code omitted

csv_file = open("forum_export.csv", mode="w")
fields = ["id", "body", "author_name", "created_on","comments", "positive_comments", "negative_comments"]
csv_writer = csv.DictWriter(csv_file, fieldnames=fields)
csv_writer.writeheader()

Now we have an open file, a CSV writer, and we’ve written the field names to a header row. The last thing that we need to do is iterate through the final_query and write a CSV row for each row returned from the query using csv_writer.writerow.

export_csv.py

# Previous code omitted

for post, comments, negative_comments, positive_comments in final_query:
    csv_writer.writerow({
        "id": post.id,
        "body": post.body,
        "author_name": post.author_name,
        "created_on": post.created_on.date(),
        "comments": comments or 0,
        "positive_comments": positive_comments or 0,
        "negative_comments": negative_comments or 0
    })

csv_file.close()

There are a few modifications that we need to make to the data returned from our query so that it can be written to CSV:

  • The timestamp for created_on needs to be turned into a date.
  • For each comment value, we need to add a default 0 if there were none returned for the particular post.

The last thing that we need to do is run this:

(forum) $ DB_URL=postgres://admin:password@PUBLIC_IP:80/forum python export_csv.py

Now we should have properly formatted rows within forum_export.csv.

Create the `export_json.py` Script

For the JSON export, we’ll use much of the same code as export_csv.py, except that we’ll rely on the json module instead of the csv module to write out our results.

Let’s create export_json.py. We can do this by making a copy of export_csv.py and removing the CSV-related logic:

(forum) $ cp export_csv.py export_json.py

export_json.py

import os

from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker

db_url = os.environ["DB_URL"]
engine = create_engine(db_url)
Session = sessionmaker(bind=engine)
session = Session()

from sqlalchemy.sql import func

from forum.models import Post, Comment

comments = (
    session.query(Comment.post_id, func.count("*").label("comments"))
    .group_by(Comment.post_id)
    .subquery()
)

negative_comments = (
    session.query(Comment.post_id, func.count("*").label("negative_comments"))
    .filter(Comment.sentiment == "negative")
    .group_by(Comment.post_id)
    .subquery()
)

positive_comments = (
    session.query(Comment.post_id, func.count("*").label("positive_comments"))
    .filter(Comment.sentiment == "positive")
    .group_by(Comment.post_id)
    .subquery()
)

final_query = (
    session.query(
        Post,
        comments.c.comments,
        negative_comments.c.negative_comments,
        positive_comments.c.positive_comments,
    )
    .outerjoin(comments, Post.id == comments.c.post_id)
    .outerjoin(negative_comments, Post.id == negative_comments.c.post_id)
    .outerjoin(positive_comments, Post.id == positive_comments.c.post_id)
)

for post, comments, negative_comments, positive_comments in final_query:
    {
        "id": post.id,
        "body": post.body,
        "author_name": post.author_name,
        "created_on": post.created_on.date(),
        "comments": comments or 0,
        "positive_comments": positive_comments or 0,
        "negative_comments": negative_comments or 0,
    }

Since we want the JSON objects to have the same information as the rows in CSV, we’ve kept the dictionary within our loop.

To finish this script we need to:

  1. Create an empty list to hold the items.
  2. Add the dictionary to the items list.
  3. Dump the items list as JSON to forum_export.json using the json.dump function.

Let’s add this now:

export_json.py

# previous code omitted
import json

items = []

for post, comments, negative_comments, positive_comments in final_query:
    items.append(
        {
            "id": post.id,
            "body": post.body,
            "author_name": post.author_name,
            "created_on": str(post.created_on.date()),
            "comments": comments or 0,
            "positive_comments": positive_comments or 0,
            "negative_comments": negative_comments or 0,
        }
    )

with open("forum_export.json", mode="w") as f:
    json.dump(items, f)

Now we can export the JSON using a command similar to the one we used for the CSV export:

(forum) $ DB_URL=postgres://admin:password@PUBLIC_IP:80/forum python export_json.py

We’ve successfully written scripts for all of the exporting that was requested of us.

Additional Resources

We've been tasked with exporting some information from our forum database for a few different people in our organization. Someone in the marketing department would like us to generate a CSV file for them with a line for each post that includes the number of comments the post has, the number of comments with positive sentiment, and the number of comments with negative sentiment. In addition, one of our development team interns has requested this same information in JSON format.

We're going to create two separate scripts that will utilize the forum package that we've written. This library provides us with the Post and Comment SQLAlchemy models that we need to easily interact with our database. These models can be found in ~/forum/forum/models.py.

Here's the CSV header and an example row that would meet our coworkers’ requirements:

id,body,author_name,created_on,comments,positive_comments,negative_comments
1,This is a great tool,Kevin Bacon,2019-07-04,10,8,2

For the JSON output, we'll need to write an array of JSON objects that would look like this:

[
    {
        "id": 1,
        "body": "This is a great tool",
        "author_name": "Kevin Bacon",
        "created_on": "2019-07-04",
        "comments": 10,
        "positive_comments": 8,
        "negative_comments": 2
    },
    // ... more JSON objects
]

If you'd like to follow along on your own workstation, you can clone this repository and rename the directory forum.

To feel comfortable completing this lab, you'll need to know how to do the following:

  • Create a SQLAlchemy engine and session. Watch "Configuring a SQLAlchemy Connection" from the Programming Use Cases with Python course if you're unfamiliar with how to do this.
  • Query a database using SQLAlchemy models. Watch "Utilize the Library to Export CSV" from the Programming Use Cases with Python course if you're unfamiliar with how to do this.
  • Write CSV and JSON documents. Watch the "Utilize the Library to Export CSV" and "Exporting Data as JSON" videos from the Programming Use Cases with Python course if you're unfamiliar with how to do this.

What are Hands-on Labs

Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.

Get Started
Who’s going to be learning?

How many seats do you need?

  • $499 USD per seat per year
  • Billed Annually
  • Renews in 12 months

Ready to accelerate learning?

For over 25 licenses, a member of our sales team will walk you through a custom tailored solution for your business.


$2,495.00

Checkout
Sign In
Welcome Back!

Psst…this one if you’ve been moved to ACG!