Compound queries
In the Basic queries section of this chapter, we discussed the simplest queries exposed by Elasticsearch. We also talked about the position aware queries called span queries in the Span queries
section. However, the simple ones and the span queries are not the only queries that Elasticsearch provides. The compound queries, as we call them, allow us to connect multiple queries together or alter the behavior of other queries. You may wonder if you need such functionality. Your deployment may not need it, but anything apart from a simple query will probably require compound queries. For example, combining a simple term query with a match_phrase
query to get better search results may be a good candidate for compound queries usage.
The bool query
The bool
query allows us to wrap a virtually unbounded number of queries and connect them with a logical value using one of the following sections:
should
: The query wrapped into this section may or may not match. The number ofshould
sections that have to match is controlled by theminimum_should_match
parametermust
: The query wrapped into this section must match in order for the document to be returned.must_not
: The query when wrapped into this section must not match in order for the document to be returned.
Each of the preceding mentioned sections can be present multiple times in a single bool
query. This allows us to build very complex queries that have multiple levels of nesting (you can include the bool
query in another bool
query). Remember that the score of the resulting document will be calculated by taking a sum of all the wrapped queries that the document matched.
In addition to the preceding sections, we can add the following parameters to the query body to control its behavior:
filter
: This allows us to specify the part of the query that should be used as a filter. You can read more about filters in the Filtering your results section in Chapter 4, Extending Your Querying Knowledge.boost
: This specifies the boost used in the query, defaulting to1.0
. The higher the boost, the higher the score of the matching document.minimum_should_match
: This describes the minimum number of should clauses that have to match in order for the checked document to be counted as a match. For example, it can be an integer value such as 2 or a percentage value such as 75%. For more information, refer to https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-minimum-should-match.html.disable_coord
: ABoolean
parameter (defaults tofalse
), which allows us to enable or disable the score factor computation that is based on the fraction of all the query terms that a document contains. We should set it totrue
for less precise scoring, but slightly faster queries.
Imagine that we want to find all the documents that have the term crime
in the title field. In addition, the documents may or may not have a range of 1900
to 2000
in the year field and may not have the nothing
term in the otitle
field. Such a query made with the bool
query will look as follows:
{ "query" : { "bool" : { "must" : { "term" : { "title" : "crime" } }, "should" : { "range" : { "year" : { "from" : 1900, "to" : 2000 } } }, "must_not" : { "term" : { "otitle" : "nothing" } } } } }
Note
Note that the must
, should
, and must_not
sections can contain a single query or an array of queries.
The dis_max query
The dis_max
query is very useful as it generates a union of documents returned by all the sub queries and returns it as the result. The good thing about this query is the fact that we can control how the lower scoring sub queries affect the final score of the documents. For the dis_max
query, we specify the queries using the queries
property (query or an array of queries) and the tie breaker, with the tie_breaker
property. We can also include additional boost by specifying the boost
parameter.
The final document score is calculated as the sum of scores of the maximum scoring query and the sum of scores returned from the rest of the queries, multiplied by the value of the tie parameter. So, the tie_breaker
parameter allows us to control how the lower scoring queries affect the final score. If we set the tie_breaker
parameter to 1.0
, we get the exact sum, while setting the tie parameter to 0.1
results in only 10 percent of the scores (of all the scores apart from the maximum scoring query) being added to the final score.
An example of the dis_max
query is as follows:
{ "query" : { "dis_max" : { "tie_breaker" : 0.99, "boost" : 10.0, "queries" : [ { "match" : { "title" : "crime" } }, { "match" : { "author" : "fyodor" } } ] } } }
As you can see, we included the tie_breaker
and boost
parameters. In addition to that, we specified the queries
parameter that holds the array of queries that will be run and used to generate the union of documents for results.
The boosting query
The boosting
query wraps around two queries and lowers the score of the documents returned by one of the queries. There are three sections of the boosting query that need to be defined: the positive
section that holds the query whose document score will be left unchanged, the negative
section whose resulting documents will have their score lowered, and the negative_boost
section that holds the boost
value that will be used to lower the second section's query score. The advantage of the boosting
query is that the results of both the queries (the negative and the positive ones) will be present in the results, although the scores of some queries will be lowered. For comparison, if we were to use the bool
query with the must_not
section, we wouldn't get the results for such a query.
Let's assume that we want to have the results of a simple term query for the term crime
in the title
field and want the score of such documents to not be changed. However, we also want to have the documents that range from 1800
to 1900
in the year field, and the scores of documents returned by such a query to have an additional boost of 0.5
. Such a query will look like the following:
{ "query" : { "boosting" : { "positive" : { "term" : { "title" : "crime" } }, "negative" : { "range" : { "year" : { "from" : 1800, "to" : 1900 } } }, "negative_boost" : 0.5 } } }
The constant_score query
The constant_score
query wraps another query and returns a constant score for each document returned by the wrapped query. We specify the score that should be given to the documents by using the boost
property, which defaults to 1.0
. It allows us to strictly control the score value assigned for a document matched by a query. For example, if we want to have a score of 2.0
for all the documents that have the term crime
in the title
field, we send the following query to Elasticsearch:
{ "query" : { "constant_score" : { "query" : { "term" : { "title" : "crime" } }, "boost" : 2.0 } } }
The indices query
The indices
query is useful when executing a query against multiple indices. It allows us to provide an array of indices (the indices
property) and two queries, one that will be executed if we query the index from the list (the query
property) and the second that will be executed on all the other indices (the no_match_query
property). For example, assume we have an alias named books, holding two indices: library and users. What we want to do is use this alias. However, we want to run different queries depending on which index is used for searching. An example query following this logic will look as follows:
{ "query" : { "indices" : { "indices" : [ "library" ], "query" : { "term" : { "title" : "crime" } }, "no_match_query" : { "term" : { "user" : "crime" } } } } }
In the preceding query, the query described in the query
property was run against the library index and the query defined in the no_match_query
section was run against all the other indices present in the cluster, which for our hypothetical alias means the users index.
The no_match_query
property can also have a string value instead of a query. This string value can either be all or none, but it defaults to all. If the no_match_query
property is set to all, the documents from the indices that don't match will be returned. Setting the no_match_query
property to none
will result in no documents from the indices that don't match the query from that section.