Loading and Retrieving Data in Neptune

1.5 hours
  • 3 Learning Objectives

About this Hands-on Lab

In this lab, you will load data from an S3 bucket into an existing Neptune instance using the bulk load feature. This is far more efficient than executing a large number of `INSERT` statements, addVertex, and addEdge steps, or other API calls. The Neptune instance will be available when you start the lab. However, you will need to create an IAM role and an S3 bucket, so prior knowledge of the IAM and S3 services are suggested.

Learning Objectives

Successfully complete this lab by achieving the following learning objectives:

Create an S3 Bucket and Grant Access
  1. In the AWS Management Console, search for S3 service and select it.

  2. Click Create bucket.

  3. Set the following settings:

    • Bucket name: "neptune-import09232020"
    • Region: US East (N. Virginia)
  4. Select Next.

  5. Add tag "name" for Key and "neptune-import" for Value.

  6. Select Next >> Next >> Create bucket.

  7. Visit this lab’s content repo and download the neptune-data.rdf file to your local machine.

  8. Click on the bucket’s name and select Upload.

  9. Click Add files.

  10. Select the neptune-data.rdf from your machine.

  11. Select Upload.

  12. Click on Services, and select the IAM service.

  13. Select Roles >> Create role >> S3 >> Next: Permissions.

  14. In the search bar, type "S3", select AmazonS3ReadOnlyAccess, and select Next:Tags >> Next:Review.

  15. On the Create role page:

    • Role name: "neptune-import".
    • Click Create role.
  16. Search for and select Neptune.

  17. Click Trust relationships >> Edit trust relationship, and edit the Service: line to read "Service": "rds.amazonaws.com".

  18. Select Update:Trust Policy.

  19. Click on Services to find and select the neptune-import link.

  20. Select the cluster’s name, Actions >> Manage IAM roles.

  21. On the Manage IAM roles page, ensure the neptune-import is under Current IAM roles for this cluster. If it is not, from the dropdown, select neptune-import >> Add role.

  22. Select Done.

Load the Data
  1. In the AWS Management Console search for VPC service and select it.
  2. Click on Endpoints >> Create Endpoint.
  3. Set the following settings:
    • Service category: Find service by name.
    • Service Name: "com.amazonaws.us-east-1.s3", and click Verify.
    • VPC: Select the existing VPC from the dropdown menu.
    • Route Table ID: Select the ID with the two subnets.
    • Select Create endpoint button.
  4. In the AWS Management Console search for Neptune service and select it.
  5. Select the name of the cluster, and copy the Cluster endpoint name. (Note the port number, 8182, next to the Cluster endpoint. You will need this info momentarily.)
  6. Search for IAM service and select it. Click Roles and search for "neptune".
  7. Select AWSServiceRoleForRDS to copy the Role ARN. You will need this info momentarily.
  8. Connect to the bastion host using the credentials provided and save the employee name in a variable.
  9. Install curl with https support.
  10. Use curl to submit the upload. Be sure to replace the iamRoleArn with your ARN.
  11. Press Enter. If successful, a "200 OK" status will appear.
  12. Copy the loadID (from the 200 OK message) to monitor it.
  13. Press Enter. If successful, a "200 OK" status will also appear.
Query the Data
  1. Download the RDF4J client.
  2. Extract the client.
  3. Create a SPARQL repo. Be sure to include your Neptune endpoint and append :8182/sparql at the end of the line.
  4. Type "yes" to overwrite. If successful, a repository created message should appear.
  5. Open the repo to view the submitted data via the S3 bucket.
  6. Query the data.

Additional Resources

You are working as a Database Administrator in charge of the company's Neptune graphing database. The developers have attempted to load a large amount of data using INSERT statements, but this has been unsuccessful. They are looking for a bulk data loading solution and have provided you with a small sample data set.

Please note, we will be using the RDF4J command-line client to query the database. A stripped-down copy of this client is included in the content repository for this lab here.

You can find the latest release of the full SDK here.

Note: On the instance, install curl with https support:

    sudo yum install libcurl.x86_64

What are Hands-on Labs

Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.

Get Started
Who’s going to be learning?

How many seats do you need?

  • $499 USD per seat per year
  • Billed Annually
  • Renews in 12 months

Ready to accelerate learning?

For over 25 licenses, a member of our sales team will walk you through a custom tailored solution for your business.


Sign In
Welcome Back!

Psst…this one if you’ve been moved to ACG!