Elasticsearch Server(Third Edition)
上QQ阅读APP看书,第一时间看更新

Compound queries

In the Basic queries section of this chapter, we discussed the simplest queries exposed by Elasticsearch. We also talked about the position aware queries called span queries in the Span queries section. However, the simple ones and the span queries are not the only queries that Elasticsearch provides. The compound queries, as we call them, allow us to connect multiple queries together or alter the behavior of other queries. You may wonder if you need such functionality. Your deployment may not need it, but anything apart from a simple query will probably require compound queries. For example, combining a simple term query with a match_phrase query to get better search results may be a good candidate for compound queries usage.

The bool query

The bool query allows us to wrap a virtually unbounded number of queries and connect them with a logical value using one of the following sections:

  • should: The query wrapped into this section may or may not match. The number of should sections that have to match is controlled by the minimum_should_match parameter
  • must: The query wrapped into this section must match in order for the document to be returned.
  • must_not: The query when wrapped into this section must not match in order for the document to be returned.

Each of the preceding mentioned sections can be present multiple times in a single bool query. This allows us to build very complex queries that have multiple levels of nesting (you can include the bool query in another bool query). Remember that the score of the resulting document will be calculated by taking a sum of all the wrapped queries that the document matched.

In addition to the preceding sections, we can add the following parameters to the query body to control its behavior:

  • filter: This allows us to specify the part of the query that should be used as a filter. You can read more about filters in the Filtering your results section in Chapter 4, Extending Your Querying Knowledge.
  • boost: This specifies the boost used in the query, defaulting to 1.0. The higher the boost, the higher the score of the matching document.
  • minimum_should_match: This describes the minimum number of should clauses that have to match in order for the checked document to be counted as a match. For example, it can be an integer value such as 2 or a percentage value such as 75%. For more information, refer to https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-minimum-should-match.html.
  • disable_coord: A Boolean parameter (defaults to false), which allows us to enable or disable the score factor computation that is based on the fraction of all the query terms that a document contains. We should set it to true for less precise scoring, but slightly faster queries.

Imagine that we want to find all the documents that have the term crime in the title field. In addition, the documents may or may not have a range of 1900 to 2000 in the year field and may not have the nothing term in the otitle field. Such a query made with the bool query will look as follows:

{
  "query" : {
    "bool" : {
      "must" : {
        "term" : {
          "title" : "crime"
        }
      },
      "should" : {
        "range" : {
          "year" : {
            "from" : 1900,
            "to" : 2000
          }
        }
      },
      "must_not" : {
        "term" : {
          "otitle" : "nothing"
        }
      }
    }
  }
}
Note

Note that the must, should, and must_not sections can contain a single query or an array of queries.

The dis_max query

The dis_max query is very useful as it generates a union of documents returned by all the sub queries and returns it as the result. The good thing about this query is the fact that we can control how the lower scoring sub queries affect the final score of the documents. For the dis_max query, we specify the queries using the queries property (query or an array of queries) and the tie breaker, with the tie_breaker property. We can also include additional boost by specifying the boost parameter.

The final document score is calculated as the sum of scores of the maximum scoring query and the sum of scores returned from the rest of the queries, multiplied by the value of the tie parameter. So, the tie_breaker parameter allows us to control how the lower scoring queries affect the final score. If we set the tie_breaker parameter to 1.0, we get the exact sum, while setting the tie parameter to 0.1 results in only 10 percent of the scores (of all the scores apart from the maximum scoring query) being added to the final score.

An example of the dis_max query is as follows:

{
  "query" : {
    "dis_max" : {
      "tie_breaker" : 0.99,
      "boost" : 10.0,
      "queries" : [
        {
          "match" : {
            "title" : "crime"
          }
        },
        { 
          "match" : {
            "author" : "fyodor"
          }
        } 
      ] 
    } 
  }
}

As you can see, we included the tie_breaker and boost parameters. In addition to that, we specified the queries parameter that holds the array of queries that will be run and used to generate the union of documents for results.

The boosting query

The boosting query wraps around two queries and lowers the score of the documents returned by one of the queries. There are three sections of the boosting query that need to be defined: the positive section that holds the query whose document score will be left unchanged, the negative section whose resulting documents will have their score lowered, and the negative_boost section that holds the boost value that will be used to lower the second section's query score. The advantage of the boosting query is that the results of both the queries (the negative and the positive ones) will be present in the results, although the scores of some queries will be lowered. For comparison, if we were to use the bool query with the must_not section, we wouldn't get the results for such a query.

Let's assume that we want to have the results of a simple term query for the term crime in the title field and want the score of such documents to not be changed. However, we also want to have the documents that range from 1800 to 1900 in the year field, and the scores of documents returned by such a query to have an additional boost of 0.5. Such a query will look like the following:

{
  "query" : {
    "boosting" : {
      "positive" : {
        "term" : {
          "title" : "crime"
         }
      },
      "negative" : {
        "range" : {
          "year" : {
            "from" : 1800,
            "to" : 1900
          }
        } 
      },
      "negative_boost" : 0.5
    }
  }
}

The constant_score query

The constant_score query wraps another query and returns a constant score for each document returned by the wrapped query. We specify the score that should be given to the documents by using the boost property, which defaults to 1.0. It allows us to strictly control the score value assigned for a document matched by a query. For example, if we want to have a score of 2.0 for all the documents that have the term crime in the title field, we send the following query to Elasticsearch:

{
  "query" : {
    "constant_score" : {
      "query" : {
        "term" : {
          "title" : "crime"
        }
      }, 
      "boost" : 2.0
    }
  }
}

The indices query

The indices query is useful when executing a query against multiple indices. It allows us to provide an array of indices (the indices property) and two queries, one that will be executed if we query the index from the list (the query property) and the second that will be executed on all the other indices (the no_match_query property). For example, assume we have an alias named books, holding two indices: library and users. What we want to do is use this alias. However, we want to run different queries depending on which index is used for searching. An example query following this logic will look as follows:

{
  "query" : {
    "indices" : {
      "indices" : [ "library" ],
      "query" : {
        "term" : {
          "title" : "crime"
        }
      },
      "no_match_query" : {
        "term" : {
          "user" : "crime"
        }
      }
    } 
  }
}

In the preceding query, the query described in the query property was run against the library index and the query defined in the no_match_query section was run against all the other indices present in the cluster, which for our hypothetical alias means the users index.

The no_match_query property can also have a string value instead of a query. This string value can either be all or none, but it defaults to all. If the no_match_query property is set to all, the documents from the indices that don't match will be returned. Setting the no_match_query property to none will result in no documents from the indices that don't match the query from that section.