calculates the "Estimated Number of Rows"
when you have multiple predicates/filters & joins?
Well, you can find a brief summary below.
I assume that you are already familiar with the Statistics
(Density Vector, Histogram).
The algorithms depend if you use local variables,
parameters, multiple predicates
or a single predicate legacy CE, new CE, and more.
It is important to note when the SQL Server optimizer uses Density Vector
and when histogram.
Density is the ratio of unique values within the given column or a set of columns.
Density = 1 / (Number of distinct values for a column or set of column)
The Density vector is used when the optimizer cannot sniff the predicate value,
for example:
1) When you use local variables as predicates/filters in a stored procedure
or in a batch query
2) When you use OPTION (OPTIMIZE FOR UNKNOWN)
3) When the parameter sniffing feature is disabled
The Histogram is used when the optimizer is able to sniff the predicate value,
for example:
for example:
1) When you use input parameters as predicates/filters in a stored procedure
2) When you use sp_executesql with parameters
3) When you use local variables with OPTION(RECOMPILE)
So let`s review common use cases:
1) The table below refers to the use case when you use one filter
as a local variable and the statistics exist on the filtered column/index.
This also applies when you use OPTION (OPTIMIZE FOR UNKNOWN)
or when the parameter sniffing feature is disabled.
DECLARE @CustomerId INT =123
SELECT *
FROM dbo.Customers
WHERE CustomerId = @CustomerId
Table Cardinality - Total Number of Rows In Table
Density = 1 / ( # distinct values for column(s) )
D - Average length rounded down to integer (Average data length is taken from statistics density vector)
2) For use case, when you use one parameterized filter or literal,
The estimated number of rows will be selected from the histogram.
For example: Input parameters in a stored procedure or sp_executesql
SELECT *
FROM dbo.Customers
WHERE CustomerId = 123
3) When you have a query that uses more than one column predicate/filter:
SELECT *
FROM dbo.Customers
WHERE CustomerId = 123
AND Name ='XYZ'
According to Microsoft, regrading the JOIN Estimates:
The Legacy CE model assumes that users always query for data that exists.
This means that, for a join predicate that involves an equijoin operation for two tables,
the joined columns exist on both sides of the join.
In the presence of additional non-join filter predicates against the join table,
the Legacy CE assumes some level of correlation for the join predicates
and non-join filter predicates. This implied correlation is called Simple Containment.
Alternatively, the New CE uses Base Containment as the correlation.
The New CE model assumes that users might query for data that does not exist.
This means that the filter predicates on separate tables may not be correlated with each other.
Therefore, we use a probabilistic approach
1) The table below refers to the use case when you use one filter
as a local variable and the statistics exist on the filtered column/index.
This also applies when you use OPTION (OPTIMIZE FOR UNKNOWN)
or when the parameter sniffing feature is disabled.
DECLARE @CustomerId INT =123
SELECT *
FROM dbo.Customers
WHERE CustomerId = @CustomerId
Table Cardinality - Total Number of Rows In Table
Density = 1 / ( # distinct values for column(s) )
D - Average length rounded down to integer (Average data length is taken from statistics density vector)
2) For use case, when you use one parameterized filter or literal,
The estimated number of rows will be selected from the histogram.
For example: Input parameters in a stored procedure or sp_executesql
SELECT *
FROM dbo.Customers
WHERE CustomerId = 123
3) When you have a query that uses more than one column predicate/filter:
SELECT *
FROM dbo.Customers
WHERE CustomerId = 123
AND Name ='XYZ'
Table Cardinality - Total Number of Rows In Table
Selectivity(S) = (# rows that pass the predicate) / (# rows in the table)
S1 means selectivity of the first filter, S2 - selectivity of the second filter and etc
Let`s see what was has changed in the new cardinality estimator compared
to the legacy estimator:
Let`s see what was has changed in the new cardinality estimator compared
to the legacy estimator:
According to Microsoft, regrading the JOIN Estimates:
This means that, for a join predicate that involves an equijoin operation for two tables,
the joined columns exist on both sides of the join.
In the presence of additional non-join filter predicates against the join table,
the Legacy CE assumes some level of correlation for the join predicates
and non-join filter predicates. This implied correlation is called Simple Containment.
Alternatively, the New CE uses Base Containment as the correlation.
The New CE model assumes that users might query for data that does not exist.
This means that the filter predicates on separate tables may not be correlated with each other.
Therefore, we use a probabilistic approach
No comments:
Post a Comment