
Query execution essentials
Query execution is driven by the relational engine in SQL Server. This means executing the plan that resulted from the optimization process. In this section, we will focus on the highlighted sections of the following diagram, which handle query execution:

Before execution starts, the relational engine needs to initialize the estimated amount of memory necessary to run the query, known as a memory grant. Along with the actual execution, the relational engine schedules the worker threads (also known as threads, or workers) for the processes to run on and provides inter-thread communication. The number of worker threads spawned depends on the following two key aspects:
- Whether the plan was eligible for parallelism as determined by the Query Optimizer.
- The actual available Degree of Parallelism (DOP) in the system, based on current load. This may differ from estimated DOP, which is based on the server configuration's Max Degree of Parallelism (MAXDOP). For example, the MAXDOP may be 8 but the available DOP at runtime can be only 2, which impacts query performance.
During execution, as the parts of the plan that require data from the base tables are processed, the relational engine requests that the storage engine provide data from the relevant rowsets. The data returned from the storage engine is processed into the format defined by the T-SQL statement, and returns the result set to the client.
The preceding key aspects do not change even on highly concurrent systems. However, as SQL Server needs to handle many requests with limited resources, this is achieved with waiting and queuing.
To understand waits and queues in SQL Server, it is important to introduce other query-execution-related concepts. From an execution standpoint, this is what happens when a client application needs to execute a query:

Tasks and workers can naturally accumulate waits until a request completes. We will see how to monitor these in Chapter 8, Building Diagnostic Queries Using DMVs and DMFs. These waits are surfaced in each request, which can exist with different statuses during its execution:

Let's explore the different statuses mentioned in the preceding diagram:
- Running: When a task is actively running within a scheduler.
- Runnable: When a task is waiting on a first-in first-out queue for scheduler time, and otherwise has access to required resources such as data pages.
- Suspended: When a task that is running in a scheduler finds out that a required resource is not available at the moment, such as a data page, it voluntarily yields its allotted processor time, so that another request can proceed instead of allowing for idle processor time. However, a task can be in this state before it even gets on a scheduler. For example, if there isn't enough memory to grant to a new incoming query, that query must wait for memory to become available before starting the actual execution.
All these concepts and terms play a fundamental role in understanding query execution and are also important to keep in mind when troubleshooting query performance. We will further explore how to detect some of these execution conditions in Chapter 4, Exploring Query Execution Plans.