
上QQ阅读APP看书,第一时间看更新
Mechanics of the Query Optimizer
The next step in our journey toward writing efficient T-SQL queries is understanding how the SQL Server database engine optimizes a query; we will do so by exploring T-SQL query optimization internals and architecture, starting with the infamous cardinality estimation process and its building blocks. From there, we will understand how the Query Optimizer uses that information to produce a just-in-time, good-enough execution plan. This chapter will be referenced throughout this book, as we apply architectural topics to real-world uses.
Before we get started, it's important to have a common frame of reference about the following terms:
- Cardinality: Cardinality in a database is defined as the number of records, also called tuples, in each table or view.
- Frequency: This term represents the average number of occurrences of a given value in a column or column set. It's defined as the number of rows times the density.
- Density: This term represents the average number of duplicate values in each column or column set, in other words, the average distribution of unique values in the data. It's defined as 1 divided by the number of distinct values.
- Selectivity: This term represents the fraction of the row count that satisfies a given predicate, between zero and one. This is calculated as the predicate cardinality (Pc) divided by the table cardinality (Tc) multiplied by one hundred: (Pc ÷ Tc) × 100%. As the average number of duplicates decreases (the density), the selectivity of a value increases. For example, in a table that represents streets and cities in a country, many streets and cities have the same name, but each street and city combination has a unique zip code. An index on the zip code is more selective than an index on the street or city because the zip code has a much lower density than a street or city.
- Statistics: Statistics are the metadata objects that we referred to in Chapter 2, Understanding Query Processing. They maintain information on the distribution of data in a table or indexed view, over a specific column or column set. We will discuss the role of statistics in more detail in the Introducing the Cardinality Estimator section.
- Histogram: This is a bucketized representation of the distribution of data in a specific column that is kept in a statistic object. These histograms hold aggregate information on the number of rows (cardinality) and distinct values (density) for up to 200 ranges of data values, named histogram steps. For any statistics object, the histogram is always created for the first column only. For multi-column statistics, this means that the histogram does not contain information about any additional column.
In this chapter, we will cover the following topics:
- Introducing the Cardinality Estimator
- Understanding the query optimization workflow