Reindex Elasticsearch Documents

3 hours
  • 3 Learning Objectives

About this Hands-on Lab

Whether you need to change the mapping of an existing index or take a subset of data from one index and copy it to another, the `_reindex` API in Elasticsearch has you covered. With the `_reindex` API, you can take all or just a subset of data from one index and copy it to another. In this hands-on lab, you are given the opportunity to exercise the following:

* Reindex a subset of data from one index to a new index
* Create an ingest node pipeline
* Transform data during the reindexing process

Learning Objectives

Successfully complete this lab by achieving the following learning objectives:

Create the romeo_and_juliet index.

Use the Kibana console tool to execute the following:

PUT romeo_and_juliet
{
  "mappings": {
    "properties": {
      "line_id": {
        "type": "integer"
      },
      "line_number": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "play_name": {
        "type": "keyword"
      },
      "speaker": {
        "type": "keyword"
      },
      "speech_number": {
        "type": "integer"
      },
      "text_entry": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "type": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      }
    }
  },
  "settings": {
    "number_of_shards": 4,
    "number_of_replicas": 3
  }
}
Create the shakespeare-tokenizer ingest node pipeline.

Use the Kibana console tool to execute the following:

PUT _ingest/pipeline/shakespeare-tokenizer
{
  "description": "Tokenizes the text_entry field into an array. Adds a word_count field. Removes the play_name field.",
  "processors": [
    {
      "split": {
        "field": "text_entry",
        "separator": "\s+",
        "target_field": "word_array"
      }
    },
    {
      "script": {
        "lang": "painless",
        "source": "ctx.word_count = ctx.word_array.length"
      }
    },
    {
      "remove": {
        "field": "play_name"
      }
    }
  ]
}
Reindex the play “Romeo and Juliet”.

Use the Kibana console tool to execute the following:

POST _reindex
{
  "source": {
    "index": "shakespeare",
    "query": {
      "match": {
        "play_name": "Romeo and Juliet"
      }
    }
  },
  "dest": {
    "index": "romeo_and_juliet",
    "pipeline": "shakespeare-tokenizer"
  }
}

Additional Resources

You work as a research librarian who is currently studying the works of Shakespeare; specifically, Romeo and Juliet. You have a 6-node Elasticsearch cluster and the complete works of Shakespeare, which you use for your literary analysis. Currently, the complete works of Shakespeare are indexed to a single index called shakespeare, but, since you are currently focused on the play Romeo and Juliet, you would prefer to copy this play to its own index.

To accomplish this, you will first need to create a new index called romeo_and_juliet with the same field mappings as the shakespeare index. Since your 3-node Elasticsearch cluster only has 4 data nodes, you want to create the romeo_and_juliet index with 4 primary shards and 3 replica shards for maximum replication. Once the romeo_and_juliet index has been created, you will need to use the _reindex API to copy all documents with play_name of "Romeo and Juliet" to the romeo_and_juliet index.

In addition to copying the data for the play Romeo and Juliet to its own index, you also want to modify the data in-flight during the reindexing process. Specifically, you want to take the contents of the field text_entry and store each whitespace-delimited word in an array called word_array. Additionally, you want to add a word_count field that is equal to the number of words in the word_array field. Lastly, because the index will only contain data for the play Romeo and Juliet, we can remove the play_name field. All of this can be accomplished with an ingest node pipeline using the split, script, and remove processors.

To use Kibana, navigate to the public IP address of the coordinator-1 node in your web browser and login with:

  • Username: elastic
  • Password: la_elastic_409

What are Hands-on Labs

Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.

Sign In
Welcome Back!

Psst…this one if you’ve been moved to ACG!

Get Started
Who’s going to be learning?