Skip to content

Contact sales

By filling out this form and clicking submit, you acknowledge our privacy policy.
  • Labs icon Lab
  • A Cloud Guru
Google Cloud Platform icon
Labs

Forum Export: Exporting Data as JSON and CSV Using Python

Databases are great for storing and allowing us to query information, but the default shape and SQL are not always the best options for people within our organization. If we know the schema of our database tables, we can create classes to model our data and then massage the data into other shapes that are more helpful for specific use cases. In this hands-on lab, we'll use an existing SQLAlchemy library to query our forum database and export data as both JSON and CSV for other employees in our organization.

Google Cloud Platform icon
Labs

Path Info

Level
Clock icon Intermediate
Duration
Clock icon 45m
Published
Clock icon Aug 26, 2019

Contact sales

By filling out this form and clicking submit, you acknowledge our privacy policy.

Table of Contents

  1. Challenge

    Create a Virtualenv and Install the `forum` Package

    We have the source for our forum package downloaded to ~/forum. We're going to create a new Python 3.7 virtualenv and install the package locally. This project utilized Pipenv in development, and we can use that now to create the virtualenv and install the dependencies:

    $ cd ~/forum
    $ pipenv install
    ...
    

    With the virtualenv in place we just need to activate it and then install the forum package itself:

    $ pipenv shell
    ...
    (forum) $ pip install -e .
    ...
    

    Now we can use the forum library from within Python scripts and the REPL.

  2. Challenge

    Write the Posts Query in the `export_csv.py` Script

    The foundation of both of our export scripts will be the same, so let's write that first. We need to do the following:

    • Create a SQLAlchemy engine to connect to our database server.
    • Create a Session class that is bound to our engine.
    • Create an instance of our Session class to make queries to the database.

    This is all done using SQLAlchemy itself instead of relying on code in the forum package. We're also going to pass the database connection in using a DB_URL environment variable when we run the script:

    export_csv.py

    import os
    
    from sqlalchemy import create_engine
    from sqlalchemy.orm import sessionmaker
    
    db_url = os.environ["DB_URL"]
    engine = create_engine(db_url)
    Session = sessionmaker(bind=engine)
    session = Session()
    

    With the SQLAlchemy boilerplate out of the way, we're now ready to use our models. Because we need some information about the comments table but want to receive a row for each post, we'll need to create a few different subqueries to calculate the various comment counts. Let's write these subqueries now:

    export_csv.py

    # SQLAlchemy configuration omitted
    
    from sqlalchemy.sql import func
    
    from forum.models import Post, Comment
    
    comments = (
        session.query(Comment.post_id, func.count("*").label("comments"))
        .group_by(Comment.post_id)
        .subquery()
    )
    
    negative_comments = (
        session.query(Comment.post_id, func.count("*").label("negative_comments"))
        .filter(Comment.sentiment == "negative")
        .group_by(Comment.post_id)
        .subquery()
    )
    
    positive_comments = (
        session.query(Comment.post_id, func.count("*").label("positive_comments"))
        .filter(Comment.sentiment == "positive")
        .group_by(Comment.post_id)
        .subquery()
    )
    

    We'll use these three subqueries as part of the main query that we'll be using to get the posts.

    export_csv.py

    # Previous code omitted
    
    final_query = (
        session.query(
            Post,
            comments.c.comments,
            negative_comments.c.negative_comments,
            positive_comments.c.positive_comments,
        )
        .outerjoin(comments, Post.id == comments.c.post_id)
        .outerjoin(negative_comments, Post.id == negative_comments.c.post_id)
        .outerjoin(positive_comments, Post.id == positive_comments.c.post_id)
    )
    

    This query is a bit complicated, but we're getting the information that we want out of the database, making only 4 queries regardless of how many posts and comments there are.

    This code will act as the foundation for both of our scripts.

  3. Challenge

    Add the CSV Export to `export_csv.py`

    Now that we have our final query, we need to build a csv.DictWriter and iterate through the rows that are returned from our query. Let's set up the writer first:

    export_csv.py

    # Other imports omitted
    import csv
    
    # Query code omitted
    
    csv_file = open("forum_export.csv", mode="w")
    fields = ["id", "body", "author_name", "created_on","comments", "positive_comments", "negative_comments"]
    csv_writer = csv.DictWriter(csv_file, fieldnames=fields)
    csv_writer.writeheader()
    

    Now we have an open file, a CSV writer, and we've written the field names to a header row. The last thing that we need to do is iterate through the final_query and write a CSV row for each row returned from the query using csv_writer.writerow.

    export_csv.py

    # Previous code omitted
    
    for post, comments, negative_comments, positive_comments in final_query:
        csv_writer.writerow({
            "id": post.id,
            "body": post.body,
            "author_name": post.author_name,
            "created_on": post.created_on.date(),
            "comments": comments or 0,
            "positive_comments": positive_comments or 0,
            "negative_comments": negative_comments or 0
        })
    
    csv_file.close()
    

    There are a few modifications that we need to make to the data returned from our query so that it can be written to CSV:

    • The timestamp for created_on needs to be turned into a date.
    • For each comment value, we need to add a default 0 if there were none returned for the particular post.

    The last thing that we need to do is run this:

    (forum) $ DB_URL=postgres://admin:password@PUBLIC_IP:80/forum python export_csv.py
    

    Now we should have properly formatted rows within forum_export.csv.

  4. Challenge

    Create the `export_json.py` Script

    For the JSON export, we'll use much of the same code as export_csv.py, except that we'll rely on the json module instead of the csv module to write out our results.

    Let's create export_json.py. We can do this by making a copy of export_csv.py and removing the CSV-related logic:

    (forum) $ cp export_csv.py export_json.py
    

    export_json.py

    import os
    
    from sqlalchemy import create_engine
    from sqlalchemy.orm import sessionmaker
    
    db_url = os.environ["DB_URL"]
    engine = create_engine(db_url)
    Session = sessionmaker(bind=engine)
    session = Session()
    
    from sqlalchemy.sql import func
    
    from forum.models import Post, Comment
    
    comments = (
        session.query(Comment.post_id, func.count("*").label("comments"))
        .group_by(Comment.post_id)
        .subquery()
    )
    
    negative_comments = (
        session.query(Comment.post_id, func.count("*").label("negative_comments"))
        .filter(Comment.sentiment == "negative")
        .group_by(Comment.post_id)
        .subquery()
    )
    
    positive_comments = (
        session.query(Comment.post_id, func.count("*").label("positive_comments"))
        .filter(Comment.sentiment == "positive")
        .group_by(Comment.post_id)
        .subquery()
    )
    
    final_query = (
        session.query(
            Post,
            comments.c.comments,
            negative_comments.c.negative_comments,
            positive_comments.c.positive_comments,
        )
        .outerjoin(comments, Post.id == comments.c.post_id)
        .outerjoin(negative_comments, Post.id == negative_comments.c.post_id)
        .outerjoin(positive_comments, Post.id == positive_comments.c.post_id)
    )
    
    for post, comments, negative_comments, positive_comments in final_query:
        {
            "id": post.id,
            "body": post.body,
            "author_name": post.author_name,
            "created_on": post.created_on.date(),
            "comments": comments or 0,
            "positive_comments": positive_comments or 0,
            "negative_comments": negative_comments or 0,
        }
    

    Since we want the JSON objects to have the same information as the rows in CSV, we've kept the dictionary within our loop.

    To finish this script we need to:

    1. Create an empty list to hold the items.
    2. Add the dictionary to the items list.
    3. Dump the items list as JSON to forum_export.json using the json.dump function.

    Let's add this now:

    export_json.py

    # previous code omitted
    import json
    
    items = []
    
    for post, comments, negative_comments, positive_comments in final_query:
        items.append(
            {
                "id": post.id,
                "body": post.body,
                "author_name": post.author_name,
                "created_on": str(post.created_on.date()),
                "comments": comments or 0,
                "positive_comments": positive_comments or 0,
                "negative_comments": negative_comments or 0,
            }
        )
    
    with open("forum_export.json", mode="w") as f:
        json.dump(items, f)
    
    

    Now we can export the JSON using a command similar to the one we used for the CSV export:

    (forum) $ DB_URL=postgres://admin:password@PUBLIC_IP:80/forum python export_json.py
    

    We've successfully written scripts for all of the exporting that was requested of us.

The Cloud Content team comprises subject matter experts hyper focused on services offered by the leading cloud vendors (AWS, GCP, and Azure), as well as cloud-related technologies such as Linux and DevOps. The team is thrilled to share their knowledge to help you build modern tech solutions from the ground up, secure and optimize your environments, and so much more!

What's a lab?

Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.

Provided environment for hands-on practice

We will provide the credentials and environment necessary for you to practice right within your browser.

Guided walkthrough

Follow along with the author’s guided walkthrough and build something new in your provided environment!

Did you know?

On average, you retain 75% more of your learning if you get time for practice.

Start learning by doing today

View Plans