Filtered Statistics, Legacy vs New CE,
Trace flag 4199...
Do you know these words?
If not, continue reading.
If you are stuck with a SQL query that doesn`t perform well,
here is a checklist of things that may help you to get ideas.
This is a high-level checklist without deep details, just to give you some options to try.
Please note: there is no magic that can fix a crappy query, so the first step
you should take is to check if you can rewrite the query.
1) Search for Non-SARGable filters that prevent usage of existing index(es)
Like COLLATE function, DATE/STRING/ISNULL functions on table columns
or using LIKE ‘%SearchString’
Search for a big difference between the actual and estimated number of rows.
Your goal is to have the same number of actual and estimated rows as much as possible.
Possible reasons for mismatch: usage of table variables and/or local variables,
parameter sniffing, not indexed fields, not updated statistics, non-SARGable filters,
implicit conversion.
3) Look for thick data arrows
They indicate a high number of data/rows passing from an operator to an operator.
Search for table/index scans, see if adding an index on filtered/joined fields can help.
When you have an index seek operator with a predicate (not “seek predicate”),
it means the filtering is done after the seek operation.
Compare “Number of Rows Read” vs “Actual Number of Rows”.
Also, you can use OPTION (QUERYTRACEON 9130)
It will add a “Filter” operator to the execution plan after index seek operator.
Also, you can use OPTION (QUERYTRACEON 9130)
It will add a “Filter” operator to the execution plan after index seek operator.
Spools used to save intermediate results for later reuse in TempDB.
They are hidden tables in the TempDB and they are often caused by
a lack of information about the uniqueness of data.
What can help is to create PK/Unique Key for involved tables/inner results.
Try to add DISTINCT to involved tables/inner queries.
And of course, see if adding an index or saving intermediate
results to a temp table helps.
Try to break a big query to small parts, use «divide et impera» method
(Divide and rule, or divide and conquer strategy)
You can run the query with "OPTION (QUERYTRACEON 8690)" to test how
the query will run without the spool. This option disables the usage of Spools.
They may prevent index usage and proper rows estimation/cardinality.
You can easily spot them checking the “Estimated” execution plan
or you can find UDF execution times in the actual execution plan
See UdfElapsedTime/UdfCpuTime.
Try to rewrite the query, inline the UDF logic directly into the query.
CTE is not materialized and it will access all tables involved in CTE
each time you reference the CTE.
9) Parallelism – Check if the query uses Parallelism
Usage of UDFs and table variable modifications can force the query to
Usage of UDFs and table variable modifications can force the query to
be executed in a serial way.
1. Try to execute the query with Parallelism
OPTION(USE HINT('ENABLE_PARALLEL_PLAN_PREFERENCE'))
2. Try to execute the query in a serial way,
See if there is a difference
OPTION(MAXDOP 1)
If you have multiple predicates and JOINs to other tables/views
it is worth to see if there is a difference between Legacy Cardinality Estimator (CE)
and New Cardinality Estimator (CE)
OPTION(QUERYTRACEON 9481) -– Legacy CE
OPTION(QUERYTRACEON 2312) -– New CE
Try to execute the query with lower or higher compatibility level
OPTION
(USE HINT('QUERY_OPTIMIZER_COMPATIBILITY_LEVEL_110')) -- SQL 2012 --> OLD CE
OPTION (USE HINT('QUERY_OPTIMIZER_COMPATIBILITY_LEVEL_120')) -- SQL 2014 --> New CE
OPTION (USE HINT('QUERY_OPTIMIZER_COMPATIBILITY_LEVEL_130')) -- SQL 2016
OPTION (USE HINT('QUERY_OPTIMIZER_COMPATIBILITY_LEVEL_140')) -- SQL 2017
OPTION (USE HINT('QUERY_OPTIMIZER_COMPATIBILITY_LEVEL_120')) -- SQL 2014 --> New CE
OPTION (USE HINT('QUERY_OPTIMIZER_COMPATIBILITY_LEVEL_130')) -- SQL 2016
OPTION (USE HINT('QUERY_OPTIMIZER_COMPATIBILITY_LEVEL_140')) -- SQL 2017
OPTION
(USE HINT('QUERY_OPTIMIZER_COMPATIBILITY_LEVEL_150')) -- SQL 2019
Try to execute the query with
OPTION(QUERYTRACEON 4199)
It will instruct SQL Server to enable query optimizer fixes that are disabled by default.
13) Batch Mode vs Row Mode processing
If the query processes many rows,
It is worth to see if Batch Mode will be helpful.
13) Batch Mode vs Row Mode processing
If the query processes many rows,
It is worth to see if Batch Mode will be helpful.
To enable Batch Mode on a Rowstore table you can try one of the following:
or use other known techniques to handle parameter sniffing issues.
- Create an empty temp table with columnstore clustered index
and add to the query LEFT JOIN to the temp table
CREATE TABLE #tmp (id INT, INDEX Tmp_id CLUSTERED COLUMNSTORE)
....
LEFT JOIN #tmp ON 1 = 0
- Create a dummy filtered ColumnStore index on the existing table involved
in the query, something like:
CREATE NONCLUSTERED COLUMNSTORE INDEX ix_dummy_nc ON dbo.tablename(Col1) WHERE col1 = -1
14) Parameter Sniffing, Table Variables
Stored procedures with input parameters and sp_executesql with parameters
Stored procedures with input parameters and sp_executesql with parameters
will sniff the values of the parameters at the first execution
and use the statistics histogram and build the execution plan according to it.
Try OPTION(RECOMPILE), if you have parameters and/or table variables in the queryand use the statistics histogram and build the execution plan according to it.
or use other known techniques to handle parameter sniffing issues.
See if existing order of joins in the execution plan makes sense,
If not, you can control it by using ROW GOALS,
Add TOP (2147483648) to CTE or inner queries that you want them to be executed first.
To reduce the number of estimated rows, try using top(@i) with option(optimize for @i=1)
To increase the number of estimated rows, try using Adam Machanic dbo.Many function
16) Filtered Statistics
With 200 steps, the statistics can be inaccurate, especially when you have skew data.
When you have joins to lookup tables or when you join fact to dimension tables
Filtered statistics can help to improve the cardinality estimation.
In SPs you might add option(recompile), so the optimizer will pickup
the filtered statistics.
You can see the statistics that have been used in the actual execution plan.
17) SET OPTIONS
Compare application connection settings with SSMS connection settings.
For example, check if your application connects with ARITHABORT OFF.
18) Views (Nested Views)
How to optimize views? Or how to optimize queries that use Views?
If your query uses view(s) or nested view(s), you may have a problem
with predicates/filters push down.
In case you have queries that use views and these queries cannot be changed,
meaning the views objects cannot be dropped and your only option is to
change the view definition, you can try to use Table-valued functions(TVF).
The TVFs can push down the filters and as the result allow the optimizer to build
a more optimal execution plan.
For example, you have a view named vw_MyView,
you can create TVF with the code of the vw_MyView and change the view definition
to do SELECT from the newly created TVF.
Another option is to use Row Goals (TOP Clauses) in the view/TVF to control
the execution order of subqueries/CTEs.
19) Compare Duration and CPU Duration
Duration(query time overall) is the total time including SQL Server CPU,
network and an application getting the data.
If you see a big difference between them
(e.g: CPU is much less than the query duration)
It can indicate blocking or network issues
or the way how the application consumes the data(RBAR: Row By Agonizing Row)
Try to execute the query with various query hints
SELECT * FROM sys.dm_exec_valid_use_hints
Note: You must be very careful with the hints,
you may hurt the performance if you do not understand how it works
and you are not taking into account your data, workload,
To increase the number of estimated rows, try using Adam Machanic dbo.Many function
With 200 steps, the statistics can be inaccurate, especially when you have skew data.
When you have joins to lookup tables or when you join fact to dimension tables
Filtered statistics can help to improve the cardinality estimation.
In SPs you might add option(recompile), so the optimizer will pickup
the filtered statistics.
You can see the statistics that have been used in the actual execution plan.
Compare application connection settings with SSMS connection settings.
For example, check if your application connects with ARITHABORT OFF.
and other set options like SET ANSI_NULLS, SET ANSI_PADDING.
By default, ADO.NET sets arithabort to false and SSMS to true.
By default, ADO.NET sets arithabort to false and SSMS to true.
How to optimize views? Or how to optimize queries that use Views?
If your query uses view(s) or nested view(s), you may have a problem
with predicates/filters push down.
In case you have queries that use views and these queries cannot be changed,
meaning the views objects cannot be dropped and your only option is to
change the view definition, you can try to use Table-valued functions(TVF).
The TVFs can push down the filters and as the result allow the optimizer to build
a more optimal execution plan.
For example, you have a view named vw_MyView,
you can create TVF with the code of the vw_MyView and change the view definition
to do SELECT from the newly created TVF.
Another option is to use Row Goals (TOP Clauses) in the view/TVF to control
the execution order of subqueries/CTEs.
19) Compare Duration and CPU Duration
Duration(query time overall) is the total time including SQL Server CPU,
network and an application getting the data.
If you see a big difference between them
(e.g: CPU is much less than the query duration)
It can indicate blocking or network issues
or the way how the application consumes the data(RBAR: Row By Agonizing Row)
20) HASH - Join Hint
If there are many rows processed and NESTED LOOP is used and it is not
reasonable to you, and you cannot find a way how to make the estimation
to be more accurate, try to add HASH join hint,
like INNER HASH JOIN dbo.table
reasonable to you, and you cannot find a way how to make the estimation
to be more accurate, try to add HASH join hint,
like INNER HASH JOIN dbo.table
21) IN / OR Clauses
1. If you have IN (Value1,Value2) clause in the query or you have OR filter
Try to rewrite the IN (Value1,Value2) to UNION.
Try to rewrite the IN (Value1,Value2) to UNION.
Same with OR filters.
It may help the optimizer to build a more correct execution plan
and use individual index seeks.
2. If you have something like IN (SELECT Col FROM @Tbl)
and use individual index seeks.
2. If you have something like IN (SELECT Col FROM @Tbl)
Try to convert the IN to INNER JOIN,
especially when you have a high number of rows in @Tbl
especially when you have a high number of rows in @Tbl
22) Small hints/tricks
1) To remove spilling to TempDB of a sort operator,
you can add OPTION(min_grant_percent = 1)
SQL Server will grant memory to the query in the following way:you can add OPTION(min_grant_percent = 1)
(Maximum SQL Server memory * 90%) * 25% * (min_grant_percent value)
2) If you have windows function with OVER clause
Try to use ROWS UNBOUNDED PRECEDING
it will do the work in memory.
3) To check existence, try to use NOT EXISTS
instead of LEFT JOIN. In most cases, it will perform better by
stopping at first existing value.
SELECT COUNT_BIG(u.Id) AS records
FROM dbo.Users AS u
WHERE NOT EXISTS ( SELECT 1/0
FROM dbo.Posts AS p
WHERE p.OwnerUserId = u.Id )
4) ISNULL vs COALESCE
If you use ISNULL/COALESCE on the filtered column,
like WHERE ISNULL(Column,10) = 10
or WHERE COALESCE(Column,10) = 10
and the column in a table is not null-able, then use ISNULL,
because ISNULL can detect that the column in the table is not null-able
and as a result can use index seek. COALESCE will do an index scan.
23) As the last resort 2) If you have windows function with OVER clause
Try to use ROWS UNBOUNDED PRECEDING
it will do the work in memory.
3) To check existence, try to use NOT EXISTS
instead of LEFT JOIN. In most cases, it will perform better by
stopping at first existing value.
SELECT COUNT_BIG(u.Id) AS records
FROM dbo.Users AS u
WHERE NOT EXISTS ( SELECT 1/0
FROM dbo.Posts AS p
WHERE p.OwnerUserId = u.Id )
4) ISNULL vs COALESCE
If you use ISNULL/COALESCE on the filtered column,
like WHERE ISNULL(Column,10) = 10
or WHERE COALESCE(Column,10) = 10
and the column in a table is not null-able, then use ISNULL,
because ISNULL can detect that the column in the table is not null-able
and as a result can use index seek. COALESCE will do an index scan.
Try to execute the query with various query hints
SELECT * FROM sys.dm_exec_valid_use_hints
you may hurt the performance if you do not understand how it works
and you are not taking into account your data, workload,
production environment.
No comments:
Post a Comment