Lab
A Cloud Guru

Forum Export: Exporting Data as JSON and CSV Using Python

Databases are great for storing and allowing us to query information, but the default shape and SQL are not always the best options for people within our organization. If we know the schema of our database tables, we can create classes to model our data and then massage the data into other shapes that are more helpful for specific use cases. In this hands-on lab, we'll use an existing SQLAlchemy library to query our forum database and export data as both JSON and CSV for other employees in our organization.

Try for free Contact sales

Path Info

Level

Intermediate

Duration

45m

Published

Aug 26, 2019

Challenge

Create a Virtualenv and Install the `forum` Package
We have the source for our forum package downloaded to ~/forum. We're going to create a new Python 3.7 virtualenv and install the package locally. This project utilized Pipenv in development, and we can use that now to create the virtualenv and install the dependencies:
```
$ cd ~/forum
$ pipenv install
...
```
With the virtualenv in place we just need to activate it and then install the forum package itself:
```
$ pipenv shell
...
(forum) $ pip install -e .
...
```
Now we can use the forum library from within Python scripts and the REPL.

Challenge

Write the Posts Query in the `export_csv.py` Script

The foundation of both of our export scripts will be the same, so let's write that first. We need to do the following:

Create a SQLAlchemy engine to connect to our database server.
Create a Session class that is bound to our engine.
Create an instance of our Session class to make queries to the database.

This is all done using SQLAlchemy itself instead of relying on code in the forum package. We're also going to pass the database connection in using a DB_URL environment variable when we run the script:

export_csv.py

import os

from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker

db_url = os.environ["DB_URL"]
engine = create_engine(db_url)
Session = sessionmaker(bind=engine)
session = Session()

With the SQLAlchemy boilerplate out of the way, we're now ready to use our models. Because we need some information about the comments table but want to receive a row for each post, we'll need to create a few different subqueries to calculate the various comment counts. Let's write these subqueries now:

export_csv.py

# SQLAlchemy configuration omitted

from sqlalchemy.sql import func

from forum.models import Post, Comment

comments = (
    session.query(Comment.post_id, func.count("*").label("comments"))
    .group_by(Comment.post_id)
    .subquery()
)

negative_comments = (
    session.query(Comment.post_id, func.count("*").label("negative_comments"))
    .filter(Comment.sentiment == "negative")
    .group_by(Comment.post_id)
    .subquery()
)

positive_comments = (
    session.query(Comment.post_id, func.count("*").label("positive_comments"))
    .filter(Comment.sentiment == "positive")
    .group_by(Comment.post_id)
    .subquery()
)

We'll use these three subqueries as part of the main query that we'll be using to get the posts.

export_csv.py

# Previous code omitted

final_query = (
    session.query(
        Post,
        comments.c.comments,
        negative_comments.c.negative_comments,
        positive_comments.c.positive_comments,
    )
    .outerjoin(comments, Post.id == comments.c.post_id)
    .outerjoin(negative_comments, Post.id == negative_comments.c.post_id)
    .outerjoin(positive_comments, Post.id == positive_comments.c.post_id)
)

This query is a bit complicated, but we're getting the information that we want out of the database, making only 4 queries regardless of how many posts and comments there are.

This code will act as the foundation for both of our scripts.

Challenge

Add the CSV Export to `export_csv.py`
Now that we have our final query, we need to build a csv.DictWriter and iterate through the rows that are returned from our query. Let's set up the writer first:

export_csv.py
```
# Other imports omitted
import csv

# Query code omitted

csv_file = open("forum_export.csv", mode="w")
fields = ["id", "body", "author_name", "created_on","comments", "positive_comments", "negative_comments"]
csv_writer = csv.DictWriter(csv_file, fieldnames=fields)
csv_writer.writeheader()
```
Now we have an open file, a CSV writer, and we've written the field names to a header row. The last thing that we need to do is iterate through the final_query and write a CSV row for each row returned from the query using csv_writer.writerow.

export_csv.py
```
# Previous code omitted

for post, comments, negative_comments, positive_comments in final_query:
    csv_writer.writerow({
        "id": post.id,
        "body": post.body,
        "author_name": post.author_name,
        "created_on": post.created_on.date(),
        "comments": comments or 0,
        "positive_comments": positive_comments or 0,
        "negative_comments": negative_comments or 0
    })

csv_file.close()
```
There are a few modifications that we need to make to the data returned from our query so that it can be written to CSV:
- The timestamp for created_on needs to be turned into a date.
- For each comment value, we need to add a default 0 if there were none returned for the particular post.
The last thing that we need to do is run this:
```
(forum) $ DB_URL=postgres://admin:password@PUBLIC_IP:80/forum python export_csv.py
```
Now we should have properly formatted rows within forum_export.csv.

Challenge

Create the `export_json.py` Script

For the JSON export, we'll use much of the same code as export_csv.py, except that we'll rely on the json module instead of the csv module to write out our results.

Let's create export_json.py. We can do this by making a copy of export_csv.py and removing the CSV-related logic:

(forum) $ cp export_csv.py export_json.py

export_json.py

import os

from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker

db_url = os.environ["DB_URL"]
engine = create_engine(db_url)
Session = sessionmaker(bind=engine)
session = Session()

from sqlalchemy.sql import func

from forum.models import Post, Comment

comments = (
    session.query(Comment.post_id, func.count("*").label("comments"))
    .group_by(Comment.post_id)
    .subquery()
)

negative_comments = (
    session.query(Comment.post_id, func.count("*").label("negative_comments"))
    .filter(Comment.sentiment == "negative")
    .group_by(Comment.post_id)
    .subquery()
)

positive_comments = (
    session.query(Comment.post_id, func.count("*").label("positive_comments"))
    .filter(Comment.sentiment == "positive")
    .group_by(Comment.post_id)
    .subquery()
)

final_query = (
    session.query(
        Post,
        comments.c.comments,
        negative_comments.c.negative_comments,
        positive_comments.c.positive_comments,
    )
    .outerjoin(comments, Post.id == comments.c.post_id)
    .outerjoin(negative_comments, Post.id == negative_comments.c.post_id)
    .outerjoin(positive_comments, Post.id == positive_comments.c.post_id)
)

for post, comments, negative_comments, positive_comments in final_query:
    {
        "id": post.id,
        "body": post.body,
        "author_name": post.author_name,
        "created_on": post.created_on.date(),
        "comments": comments or 0,
        "positive_comments": positive_comments or 0,
        "negative_comments": negative_comments or 0,
    }

Since we want the JSON objects to have the same information as the rows in CSV, we've kept the dictionary within our loop.

To finish this script we need to:

Create an empty list to hold the items.
Add the dictionary to the items list.
Dump the items list as JSON to forum_export.json using the json.dump function.

Let's add this now:

export_json.py

# previous code omitted
import json

items = []

for post, comments, negative_comments, positive_comments in final_query:
    items.append(
        {
            "id": post.id,
            "body": post.body,
            "author_name": post.author_name,
            "created_on": str(post.created_on.date()),
            "comments": comments or 0,
            "positive_comments": positive_comments or 0,
            "negative_comments": negative_comments or 0,
        }
    )

with open("forum_export.json", mode="w") as f:
    json.dump(items, f)

Now we can export the JSON using a command similar to the one we used for the CSV export:

(forum) $ DB_URL=postgres://admin:password@PUBLIC_IP:80/forum python export_json.py

We've successfully written scripts for all of the exporting that was requested of us.

Author

A Cloud Guru

The Cloud Content team comprises subject matter experts hyper focused on services offered by the leading cloud vendors (AWS, GCP, and Azure), as well as cloud-related technologies such as Linux and DevOps. The team is thrilled to share their knowledge to help you build modern tech solutions from the ground up, secure and optimize your environments, and so much more!

What's a lab?

Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.