Monday, August 3, 2009

Why understanding "Logical Query Processing Phases" is important?

As you know a T-SQL code is processed in
different way than a code of C++ or C#.
In programming languages the code is processed
in the order it was written(Line by Line).
In T-SQL this is not true, for example the
FROM clause is the first that processed by
SQL Server. I think that understanding the
"Logical Query Processing Phases" is very
important for a database developer.

By understanding the order of the operations you can

  • Explain the results of complicated queries
  • Understand why aliases in the SELECT can be used
    only in ORDER BY clause
  • Understand why/when to use additional predicates in the
    ON clause (like on tbl1.id=tbl2.id and tbl2.name='Fox')
    instead of in WHERE clause(relevant only to OUTER joins)
  • Write more efficient T-SQL queries

Here is a general form of a T-SQL SELECT query:

SELECT DISTINCT TOP [...] SELECT [...]
FROM [...]
JOIN [...]
ON
WHERE [...]
GROUP BY [...]
WITH CUBE/ROLLUP
HAVING [...]
ORDER BY [...]

And here is the order in which it processed:

1. FROM
2. ON
3. JOIN
4. WHERE
5. GROUP BY
6. CUBE/ROLLUP
7. HAVING
8. SELECT
9. DISTINCT
10. ORDER BY
11. TOP

Let`s see an example. Assume that there are 2 tables,
the first table is Writers,it contains writers with their ID & name,
the second table is WritersBooks that holds published
books for each writer.

CREATE TABLE Writers
(
WriterID INT NOT NULL PRIMARY KEY,
Fname VARCHAR(50) NOT NULL
)
GO
CREATE TABLE WritersBooks

(
BookID INT NOT NULL,
WriterID INT NOT NULL,
BookName VARCHAR(50) NOT NULL,
CONSTRAINT [PK__WritersBooks] PRIMARY KEY CLUSTERED
( [BookID] ASC, [WriterID] ASC)
)
GO
INSERT INTO
Writers(WriterID,Fname)
SELECT 1,'Tom'
UNION ALL
SELECT 2,'Jack'
UNION ALL
SELECT 3,'Gary'
GO
INSERT INTO
WritersBooks(BookID,WriterID,BookName)
SELECT 100,1,'SQL Internals'
UNION ALL
SELECT 101,1,'SQL Server 2008'
UNION ALL
SELECT 200,2,'My Love Poems'
UNION ALL
SELECT 201,2,'My Story'
UNION ALL
SELECT 203,2,'My Sons'
GO
SELECT * FROM
Writers
SELECT * FROM WritersBooks

WriterID Fname
-- ---------------
1 Tom
2 Jack
3 Gary


BookID WriterID BookName
----- ----------- --------
100 1 SQL Internals
101 1 SQL Server 2008
200 2 My Love Poems
201 2 My Story
203 2 My Sons

You need to write a query that returns per writer number of
books that their name contains "SQL" characters and for
writers with no such books it should return 0 as Num of books.

The following query will return what we asked for:

SELECT Writers.WriterID,COUNT(WritersBooks.BookName) AS NumOfBooks
FROM Writers LEFT OUTER JOIN WritersBooks
ON Writers.WriterID=WritersBooks.WriterID AND WritersBooks.BookName LIKE
'%sql%'
GROUP BY Writers.WriterID

WriterID NumOfBooks
----------- -----------
1 2
2 0
3 0

As you can see there is "AND WritersBooks.BookName LIKE '%sql%' "
predicate in the ON clause.
If this predicate was in the WHERE clause we would not get results for
writers that do not have books with the "SQL" characters:

SELECT Writers.WriterID,COUNT(WritersBooks.BookName) AS NumOfBooks
FROM Writers LEFT OUTER JOIN WritersBooks
ON Writers.WriterID=WritersBooks.WriterID
WHERE WritersBooks.BookName LIKE '%sql%'
GROUP BY Writers.WriterID

WriterID NumOfBooks
----------- -----------
1 2

Now here what happens - the condition(s) in the ON clause
are processed before the actual JOIN (type of Join). In our case
we used LEFT OUTER JOIN that tell to SQL Server to
preserve(keep) all rows from the left table, so SQL Server will
add the rows from the left table that were not produced by
the ON Clause. After the ON clause we will get only writer with
ID=1,the rest 2 writers we added to result by the OUTER JOIN.

Again,if the filter "WritersBooks.BookName LIKE '%sql%'" was in
WHERE clause than SQL Server will filter out the final result
and the result would not contain the other 2 writers.

The point here is that when you use OUTER Join,
the filter in the ON clause in NOT final,because the OUTER
JOIN will add all preserved rows form left or right table according
to the type of the OUTER JOIN.

Consider the following query:

SELECT DISTINCT Writers.Fname
FROM Writers LEFT OUTER JOIN WritersBooks
ON Writers.WriterID=WritersBooks.WriterID AND WritersBooks.BookName NOT LIKE '%sql%'

Fname
--------------------------------------------------
Gary
Jack
Tom

In this query 'Tom' and 'Gary' were added to the results,despite
the fact that we filtered they out by "BookName NOT LIKE '%sql%' "
in the ON clause.

Note: There is no difference whether you provide filters
in the ON clause or in the WHERE clause when there is
INNER JOIN. Because the the step 3 is skipped.
Try it yourself.

Also by looking in the order in which the SQL Server processes
queries you can explain why you cannot use aliases defined
in the SELECT list in other places beside the ORDER BY clause.
You can reuse the aliases only in steps following the SELECT list.

Also you cannot use aliases defined in the SELECT list
in other expressions in the SELECT list. This is due to
"All at once" SQL operators.

There is a great poster about the "Logical Query Processing"
http://www.sql.co.il/books/insidetsql2008/Logical%20Query%20Processing%20Poster.pdf

No comments:

Post a Comment