In this lab, you will load data from an S3 bucket into an existing Neptune instance using the bulk load feature. This is far more efficient than executing a large number of `INSERT` statements, addVertex, and addEdge steps, or other API calls. The Neptune instance will be available when you start the lab. However, you will need to create an IAM role and an S3 bucket, so prior knowledge of the IAM and S3 services are suggested.
Successfully complete this lab by achieving the following learning objectives:
- Create an S3 Bucket and Grant Access
In the AWS Management Console, search for S3 service and select it.
Click Create bucket.
Set the following settings:
- Bucket name: "neptune-import09232020"
- Region: US East (N. Virginia)
Add tag "name" for Key and "neptune-import" for Value.
Select Next >> Next >> Create bucket.
Visit this lab’s content repo and download the neptune-data.rdf file to your local machine.
Click on the bucket’s name and select Upload.
Click Add files.
Select the neptune-data.rdf from your machine.
Click on Services, and select the IAM service.
Select Roles >> Create role >> S3 >> Next: Permissions.
In the search bar, type "S3", select AmazonS3ReadOnlyAccess, and select Next:Tags >> Next:Review.
On the Create role page:
- Role name: "neptune-import".
- Click Create role.
Search for and select Neptune.
Click Trust relationships >> Edit trust relationship, and edit the
Service:line to read
Select Update:Trust Policy.
Click on Services to find and select the neptune-import link.
Select the cluster’s name, Actions >> Manage IAM roles.
On the Manage IAM roles page, ensure the neptune-import is under Current IAM roles for this cluster. If it is not, from the dropdown, select neptune-import >> Add role.
- Load the Data
- In the AWS Management Console search for VPC service and select it.
- Click on Endpoints >> Create Endpoint.
- Set the following settings:
- Service category: Find service by name.
- Service Name: "com.amazonaws.us-east-1.s3", and click Verify.
- VPC: Select the existing VPC from the dropdown menu.
- Route Table ID: Select the ID with the two subnets.
- Select Create endpoint button.
- In the AWS Management Console search for Neptune service and select it.
- Select the name of the cluster, and copy the Cluster endpoint name. (Note the port number, 8182, next to the Cluster endpoint. You will need this info momentarily.)
- Search for IAM service and select it. Click Roles and search for "neptune".
- Select AWSServiceRoleForRDS to copy the Role ARN. You will need this info momentarily.
- Connect to the bastion host using the credentials provided and save the employee name in a variable.
- Install curl with https support.
- Use curl to submit the upload. Be sure to replace the
iamRoleArnwith your ARN.
- Press Enter. If successful, a
"200 OK"status will appear.
- Copy the loadID (from the
200 OKmessage) to monitor it.
- Press Enter. If successful, a
"200 OK"status will also appear.
- Query the Data
- Download the RDF4J client.
- Extract the client.
- Create a SPARQL repo. Be sure to include your Neptune endpoint and append
:8182/sparqlat the end of the line.
- Type "yes" to overwrite. If successful, a
repository createdmessage should appear.
- Open the repo to view the submitted data via the S3 bucket.
- Query the data.