Pages

Monday, October 28, 2019

Control the join order of tables

Have you ever been in a situation where you need to optimize a complex query
with multiple joins or a view with multiple joins?
And the query plan that SQL Server builds could be better 😊

If the query in a stored procedure, you are lucky, and you can change 
it completely by breaking the query into small pieces
And saving the intermediate results into a temp table(s). 
It will give you a better plan, better statistics and there is no need to use any hints.
And by this, you can reduce the number of rows that each part of the query will process.

But what if your query is not inside a stored procedure and/or you cannot use
the technique of splitting it to smaller joins and using temp tables
for keeping intermediate results? 
Yet, you want a way to control the join order of the tables?

For example, you have 3 tables participated in INNER joins(Table1, Table2 & Table3)
And you want to control the ORDER in which SQL Server will do the joins:
Maybe first to do join between Table3 and Table1 and then join the result to Table2.

As you are aware, when you have INNER/CROSS joins (NOT LEFT/RIGHT Joins)
in your query, the order in which you write the Joins doesn’t make any difference,
because the query optimizer will decide how to do it.

The SQL Server doesn’t take into account the order of tables that you have
in FROM clause(Inner joins)

Either, rewriting the query to use CTE or inline Subqueries
or to use parentheses probably will not help, 
because SQL Server will decide on its own how to get the data.
If you already did all the right work(statistics, indexes) and you can see 
a better plan for your query than the SQL Server creates,
The below options can be helpful:

One (not attractive and not recommended) way is to use
OPTION(FORCE ORDER) hint.
One of the drawbacks of OPTION(FORCE ORDER) hint is that 
all joins between the listed table will be affected,

Meaning all joins will be performed in the order how they show in the query
And you cannot control just specific 2-3 joins order.

The second option is more delicate.
You can use the TOP clause in your CTE or inline Subquery. This will force the optimizer to do the joins first, in order to satisfy the TOP condition,
Meaning SQL server will have to perform the JOINs first and/or together as a group.

You can specify a larger number like below not to limit the actual number of rows
TOP(2147483647)

The source of this ‘trick’/technique comes from Adam Machanic
https://sqlbits.com/Sessions/Event14/Query_Tuning_Mastery_Clash_of_the_Row_Goals

2 comments:

  1. Don't you think the limit operation (TOP N) is also costs and potentially could depreciate the benefits of using the preferred tables order?


    - misprint-
    joins(Table1, Table2 & Table3) ... TableC and Table A and then join the result to TableB.

    ReplyDelete
  2. Thanks for the typo notification.
    Yes, TOP N can also hurt :),
    Therefore you should not use it blindly.

    ReplyDelete