Loading and Retrieving Data in Neptune

1.5 hours
  • 3 Learning Objectives

About this Hands-on Lab

In this lab, you will load data from an S3 bucket into an existing Neptune instance using the bulk load feature. This is far more efficient than executing a large number of `INSERT` statements, addVertex, and addEdge steps, or other API calls. The Neptune instance will be available when you start the lab. However, you will need to create an IAM role and an S3 bucket, so prior knowledge of the IAM and S3 services are suggested.

Learning Objectives

Successfully complete this lab by achieving the following learning objectives:

Create an S3 Bucket and Grant Access
  1. In the AWS Management Console, navigate to S3.

  2. Click Create bucket.

  3. Set the following settings:

    • Bucket name: "neptune-import<INSERT_CURRENT_DATE_HERE>"
    • Region: US East (N. Virginia)
  4. Copy the S3 bucket name into a text file for later use.

  5. Select Next.

  6. Under Tags, set Key to "name" and Value to "neptune-import".

  7. Select Next >> Next >> Create bucket.

  8. Visit this lab’s GitHub repo and download the neptune-data.rdf file to your local machine.

  9. Click on the bucket’s name and select Upload.

  10. Click Add files.

  11. Select the neptune-data.rdf from your machine.

  12. Select Upload.

  13. Click on Services and select IAM.

  14. Select Roles >> Create role >> S3 >> Next: Permissions.

  15. Select AmazonS3ReadOnlyAccess.

  16. Select Next:Tags >> Next:Review.

  17. For Role name, enter "neptune-import".

  18. Click Create role.

  19. Search for and select neptune-import.

  20. Select the Trust relationships tab.

  21. Click Edit trust relationship.

  22. In the policy document, edit the Service: line to read

    "Service": [
      "s3.amazonaws.com",
      "rds.amazonaws.com"
     ]
  23. Select Update Trust Policy.

  24. Click Services and select Neptune.

  25. Select the neptune-cluster.

  26. Click Actions >> Manage IAM roles.

  27. With the neptune-import role selected, click Done.

Load the Data
  1. In the AWS Management Console, navigate to VPC.
  2. On the left menu, click Endpoints >> Create Endpoint.
  3. In Service category, select Find service by name.
  4. In Service Name, enter "com.amazonaws.us-east-1.s3" and click Verify.
  5. In VPC, select the existing VPC from the dropdown menu.
  6. Under Route Table ID, select the ID associated with 2 subnets.
  7. Click Create endpoint >> Close.
  8. Navigate to IAM.
  9. On the left menu, click Roles.
  10. Click the neptune-import role.
  11. Under Summary, copy the role ARN into a text file for later use.
  12. Navigate to Neptune.
  13. Select the neptune-cluster.
  14. Under Connectivity & security, copy the cluster endpoint and port number into a text file for later use.
  15. Connect to the bastion host using the provided lab credentials and save the endpoint URL as an environment variable.
  16. Install curl with https support.
  17. Use curl to submit the upload, adding your unique role ARN to iamRoleArn and unique bucket name to source. If successful, a "200 OK" status will appear.
  18. Copy the loadID (from the 200 OK message) to monitor the job.
Query the Data
  1. Download the RDF4J client.
  2. Extract the client.
  3. Create a SPARQL repo. Be sure to include your Neptune endpoint and append :8182/sparql at the end of the line.
  4. Type "yes" to overwrite the configuration. If successful, a repository created message should appear.
  5. Open the repo to view the submitted S3 bucket data.
  6. Query the data.

Additional Resources

You are working as a Database Administrator in charge of the company's Neptune graphing database. The developers have attempted to load a large amount of data using INSERT statements, but this has been unsuccessful. They are looking for a bulk data loading solution and have provided you with a small sample data set.

Please note, we will be using the RDF4J command-line client to query the database. A stripped-down copy of this client is included in the content repository for this lab HERE .

Use us-east-1 for this lab's region. Please use the guide guide for the latest recommended steps.

You can find the latest release of the full SDK HERE .

Note: On the instance, install curl with https support:

    sudo yum install libcurl.x86_64

What are Hands-on Labs

Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.

Sign In
Welcome Back!

Psst…this one if you’ve been moved to ACG!

Get Started
Who’s going to be learning?