Elasticsearch 7.0 Cookbook(Fourth Edition)
上QQ阅读APP看书,第一时间看更新

How it works...

The reindex functionality introduced in Elasticsearch 5.x provides an efficient way to reindex a document.

In the previous Elasticsearch version, this functionality had to be implemented at a client level. The advantages of the new Elasticsearch implementations are as follows:

  • Fast copying of data because it is completely managed on the server side.
  • Better management of the operation due to the new task API.
  • Better error-handling support as it is done at the server level. This allows us to manage failovers better during reindex operations.

At the server level, this action is composed of the following steps:

  1. Initialization of an Elasticsearch task to manage the operation
  2. Creation of the target index and copying the source mappings, if required
  3. Executing a query to collect the documents to be reindexed
  4. Reindexing all the documents using bulk operations until all documents are reindexed

The main parameters that can be provided for this action are as follows:

  • The source section manages how to select source documents. The most important subsections are as follows:
    • index, which is the source index to be used. It can also be a list of indices.
    • query (optional), which is an Elasticsearch query to be used to select parts of the document.
    • sort (optional), which can be used to provide a way of sorting the documents.
  • The dest section manages how to control target written documents. The most important parameters in this section are as follows:
    • index, which is the target index to be used. If it is not available, it is to be created.
    • version_type (optional), if it is set to external, the external version is preserved.
    • routing (optional), which controls the routing in the destination index. It can be any of the following:
      • keep (the default), which preserves the original routing
      • discard, which discards the original routing
      • =<text>, which uses the text value for the routing
    • pipeline (optional), which allows you to define a custom pipeline for ingestion. We will see more about the ingestion pipeline in Chapter 12, Using the Ingest Module.
    • size (optional), the number of documents to be reindexed.
    • script (optional), which allows you to define a scripting for document manipulation. This case will be discussed in the Reindex with a custom script recipe in Chapter 8Scripting in Elasticsearch.