Made By SQL: SQL Performance

Showing posts with label SQL Performance. Show all posts

Monday, July 24, 2023

NonParallelPlanReason="CouldNotGenerateValidParallelPlan" - one of the possible reasons

When you see your query doesn't go parallel and in the execution
plan XML you see NonParallelPlanReason="CouldNotGenerateValidParallelPlan".
Until SQL Server 2022 there is no indication why and what is the reason.

One of the possible reasons can be that you are querying a table which has
scalar function (UDF) in a check constraint or in a computed column.

Scalar UDFs in check constraints and in computed columns prevent parallelism (even for SELECT Queries)

Try to disable the check constrain and check if it goes parallel.

Tuesday, April 21, 2020

A practical SQL query performance tuning checklist (23 Steps)

Row Goals, Residual Predicate, Batch Mode,
Filtered Statistics, Legacy vs New CE,
Trace flag 4199...

Do you know these words?
If not, continue reading.

If you are stuck with a SQL query that doesn`t perform well,
here is a checklist of things that may help you to get ideas.
This is a high-level checklist without deep details, just to give you some options to try.
Please note: there is no magic that can fix a crappy query, so the first step
you should take is to check if you can rewrite the query.

1) Search for Non-SARGable filters that prevent usage of existing index(es)
Like COLLATE function, DATE/STRING/ISNULL functions on table columns
or using LIKE ‘%SearchString’

2) Check Estimated vs Actual number of rows
Search for a big difference between the actual and estimated number of rows.
Your goal is to have the same number of actual and estimated rows as much as possible.
Possible reasons for mismatch: usage of table variables and/or local variables,
parameter sniffing, not indexed fields, not updated statistics, non-SARGable filters,
implicit conversion.

3) Look for thick data arrows
They indicate a high number of data/rows passing from an operator to an operator.
Search for table/index scans, see if adding an index on filtered/joined fields can help.

4) See if there is Index Seek with Residual Predicate (Seek predicate vs Predicate)
When you have an index seek operator with a predicate (not “seek predicate”),
it means the filtering is done after the seek operation.

Compare “Number of Rows Read” vs “Actual Number of Rows”.
Also, you can use OPTION (QUERYTRACEON 9130)
It will add a “Filter” operator to the execution plan after index seek operator.

5) Search for Spool operators like Eager/Lazy/Table spools
Spools used to save intermediate results for later reuse in TempDB.
They are hidden tables in the TempDB and they are often caused by
a lack of information about the uniqueness of data.
What can help is to create PK/Unique Key for involved tables/inner results.
Try to add DISTINCT to involved tables/inner queries.
And of course, see if adding an index or saving intermediate
results to a temp table helps.
Try to break a big query to small parts, use «divide et impera» method
(Divide and rule, or divide and conquer strategy)
You can run the query with "OPTION (QUERYTRACEON 8690)" to test how
the query will run without the spool. This option disables the usage of Spools.

6) Search for “Implicit conversion” warnings
They may prevent index usage and proper rows estimation/cardinality.

7) See if UDFs (scalar/table-valued functions) are used
You can easily spot them checking the “Estimated” execution plan
or you can find UDF execution times in the actual execution plan
See UdfElapsedTime/UdfCpuTime.
Try to rewrite the query, inline the UDF logic directly into the query.

8) If you use a CTE, see if you reference the CTE multiple times.
CTE is not materialized and it will access all tables involved in CTE

each time you reference the CTE.

9) Parallelism – Check if the query uses Parallelism
Usage of UDFs and table variable modifications can force the query to

be executed in a serial way.

1. Try to execute the query with Parallelism
OPTION(USE HINT('ENABLE_PARALLEL_PLAN_PREFERENCE'))
2. Try to execute the query in a serial way,
See if there is a difference
OPTION(MAXDOP 1)

10) Legacy vs New Cardinality Estimator (CE)
If you have multiple predicates and JOINs to other tables/views
it is worth to see if there is a difference between Legacy Cardinality Estimator (CE)
and New Cardinality Estimator (CE)

OPTION(QUERYTRACEON 9481) -– Legacy CE
OPTION(QUERYTRACEON 2312) -– New CE

11) Compatibility Level
Try to execute the query with lower or higher compatibility level

OPTION (USE HINT('QUERY_OPTIMIZER_COMPATIBILITY_LEVEL_110')) -- SQL 2012 --> OLD CE
OPTION (USE HINT('QUERY_OPTIMIZER_COMPATIBILITY_LEVEL_120')) -- SQL 2014 --> New CE
OPTION (USE HINT('QUERY_OPTIMIZER_COMPATIBILITY_LEVEL_130')) -- SQL 2016
OPTION (USE HINT('QUERY_OPTIMIZER_COMPATIBILITY_LEVEL_140')) -- SQL 2017

OPTION (USE HINT('QUERY_OPTIMIZER_COMPATIBILITY_LEVEL_150')) -- SQL 2019

12) Trace Flag 4199 (Enable Optimizer Fixes)
Try to execute the query with
OPTION(QUERYTRACEON 4199)

It will instruct SQL Server to enable query optimizer fixes that are disabled by default.

13) Batch Mode vs Row Mode processing
If the query processes many rows,
It is worth to see if Batch Mode will be helpful.

To enable Batch Mode on a Rowstore table you can try one of the following:

Create an empty temp table with columnstore clustered index
and add to the query LEFT JOIN to the temp table

CREATE TABLE #tmp (id INT, INDEX Tmp_id CLUSTERED COLUMNSTORE)

....

LEFT JOIN #tmp ON 1 = 0

Create a dummy filtered ColumnStore index on the existing table involved
in the query, something like:

CREATE NONCLUSTERED COLUMNSTORE INDEX ix_dummy_nc ON dbo.tablename(Col1) WHERE col1 = -1

14) Parameter Sniffing, Table Variables
Stored procedures with input parameters and sp_executesql with parameters

will sniff the values of the parameters at the first execution
and use the statistics histogram and build the execution plan according to it.

Try OPTION(RECOMPILE), if you have parameters and/or table variables in the query
or use other known techniques to handle parameter sniffing issues.

15) Order of JOINS, Reduce estimated number of rows, Increase estimated number of rows
See if existing order of joins in the execution plan makes sense,
If not, you can control it by using ROW GOALS,
Add TOP (2147483648) to CTE or inner queries that you want them to be executed first.

To reduce the number of estimated rows, try using top(@i) with option(optimize for @i=1)
To increase the number of estimated rows, try using Adam Machanic dbo.Many function

16) Filtered Statistics
With 200 steps, the statistics can be inaccurate, especially when you have skew data.
When you have joins to lookup tables or when you join fact to dimension tables
Filtered statistics can help to improve the cardinality estimation.
In SPs you might add option(recompile), so the optimizer will pickup
the filtered statistics.
You can see the statistics that have been used in the actual execution plan.

17) SET OPTIONS
Compare application connection settings with SSMS connection settings.
For example, check if your application connects with ARITHABORT OFF.

and other set options like SET ANSI_NULLS, SET ANSI_PADDING.
By default, ADO.NET sets arithabort to false and SSMS to true.

18) Views (Nested Views)
How to optimize views? Or how to optimize queries that use Views?
If your query uses view(s) or nested view(s), you may have a problem
with predicates/filters push down.

In case you have queries that use views and these queries cannot be changed,
meaning the views objects cannot be dropped and your only option is to
change the view definition, you can try to use Table-valued functions(TVF).

The TVFs can push down the filters and as the result allow the optimizer to build
a more optimal execution plan.
For example, you have a view named vw_MyView,
you can create TVF with the code of the vw_MyView and change the view definition
to do SELECT from the newly created TVF.

Another option is to use Row Goals (TOP Clauses) in the view/TVF to control
the execution order of subqueries/CTEs.

19) Compare Duration and CPU Duration
Duration(query time overall) is the total time including SQL Server CPU,
network and an application getting the data.
If you see a big difference between them
(e.g: CPU is much less than the query duration)
It can indicate blocking or network issues
or the way how the application consumes the data(RBAR: Row By Agonizing Row)

20) HASH - Join Hint

If there are many rows processed and NESTED LOOP is used and it is not
reasonable to you, and you cannot find a way how to make the estimation
to be more accurate, try to add HASH join hint,
like INNER HASH JOIN dbo.table

21) IN / OR Clauses

1. If you have IN (Value1,Value2) clause in the query or you have OR filter
Try to rewrite the IN (Value1,Value2) to UNION.

Same with OR filters.

It may help the optimizer to build a more correct execution plan
and use individual index seeks.

2. If you have something like IN (SELECT Col FROM @Tbl)

Try to convert the IN to INNER JOIN,
especially when you have a high number of rows in @Tbl

22) Small hints/tricks

1) To remove spilling to TempDB of a sort operator,
you can add OPTION(min_grant_percent = 1)

SQL Server will grant memory to the query in the following way:

(Maximum SQL Server memory * 90%) * 25% * (min_grant_percent value)

  2) If you have windows function with OVER clause
Try to use ROWS UNBOUNDED PRECEDING
it will do the work in memory.

3) To check existence, try to use NOT EXISTS
instead of LEFT JOIN. In most cases, it will perform better by
stopping at first existing value.

SELECT COUNT_BIG(u.Id) AS records
FROM   dbo.Users AS u
WHERE NOT EXISTS ( SELECT 1/0
FROM   dbo.Posts AS p
WHERE p.OwnerUserId = u.Id )
4) ISNULL vs COALESCE
If you use ISNULL/COALESCE on the filtered column,
like WHERE ISNULL(Column,10) = 10
or WHERE COALESCE(Column,10) = 10
and the column in a table is not null-able, then use ISNULL,
because ISNULL can detect that the column in the table is not null-able
and as a result can use index seek. COALESCE will do an index scan.

23) As the last resort
Try to execute the query with various query hints
SELECT * FROM sys.dm_exec_valid_use_hints

Note: You must be very careful with the hints,
you may hurt the performance if you do not understand how it works
and you are not taking into account your data, workload,

production environment.

Monday, February 17, 2020

Interesting SQL Server Cursors locking observation

If you are using SQL Server cursors,
you should be aware of the following locking behavior of a cursor.

Lets take a simple example of a cursor in SQL Server,
we will use the cursor just to iterate through rows in a table.

DECLARE @CustomerID INT

DECLARE Customers_Cursor CURSOR FOR

SELECT CustomerID

FROM dbo.Customers

OPEN Account_Cursor

FETCH NEXT FROM Customers_Cursor INTO @CustomerID

WHILE @@FETCH_STATUS = 0

BEGIN

SELECT @CustomerID

FETCH NEXT FROM Customers_Cursor INTO @CustomerID

END

CLOSE Customers_Cursor

DEALLOCATE Customers_Cursor

What we want to check is what locks are acquired and how many times.
Let`s assume that the “Customers” table has PK on CustomerID column.

If you set an extended events session that captures “lock_acquired”
and “lock_released“ events, you will see that

For the same key/row, Shared Lock acquired and released twice:
The first time upon OPEN statement

and the second time upon FETCH statement.

So if you have deadlocks involving cursors and shared locks,
keep this behavior in mind.

If all you want to do is just to iterate through the rows
without jumping back and forward
and without updating rows in the cursor`s table
You can use FAST FORWARD cursor type.
DECLARE Customers_Cursor CURSOR FAST_FORWARD FOR

Friday, November 22, 2019

Table Variables - Serial plan upon data modification statements and OPTION(RECOMPILE)

Here is something to be aware of:
When you have data modification statements

(like, INSERT,UPDATE,DELETE) against a table variable,
The SQL Server will not use a parallel plan, it will use a serial plan.
This can affect the performance, so keep it in mind.

Although, there is a workaround:
INSERT INTO @Table
EXEC('SELECT .. FROM ...')

Additionally,
Until SQL Server 2019 which has "Table variable deferred compilation" feature,
the estimated number of rows of table variables was always 1.
Why? When a query/SP/batch is compiled and optimized, the table variable is empty.

If you wanted to have proper estimations then you need to use

OPTION(RECOMPILE) hint or use trace flag 2453,

so the optimize will use real row count.

Note, OPTION(RECOMPILE) hint will not always work when

you join the table variable to a table and have 1 to many relationship.

See https://www.sqlservercentral.com/blogs/should-i-use-optionrecompile for more details.

Also, the situation is different when you have some filters on your table variable.
Because there are no statistics on the table variables,

SQL Server will use the following rules for the estimation of rows:

- When there is equal filter the estimates will be 10 % out of total rows,
- When there are inequality filters, like <, >, <> the estimates will be 30 % out of total rows
- When there is an equal filter on a bit field the estimates will be 50 % out of total rows

Source articles:
https://www.brentozar.com/archive/2017/02/using-trace-flag-2453-improve-table-variable-performance/
https://support.microsoft.com/en-us/help/2952444/fix-poor-performance-when-you-use-table-variables-in-sql-server-2012-o

Monday, October 28, 2019

Control the join order of tables

Have you ever been in a situation where you need to optimize a complex query

with multiple joins or a view with multiple joins?
And the query plan that SQL Server builds could be better 😊

If the query in a stored procedure, you are lucky, and you can change

it completely by breaking the query into small pieces
And saving the intermediate results into a temp table(s).

It will give you a better plan, better statistics and there is no need to use any hints.
And by this, you can reduce the number of rows that each part of the query will process.

But what if your query is not inside a stored procedure and/or you cannot use

the technique of splitting it to smaller joins and using temp tables

for keeping intermediate results?

Yet, you want a way to control the join order of the tables?

For example, you have 3 tables participated in INNER joins(Table1, Table2 & Table3)
And you want to control the ORDER in which SQL Server will do the joins:
Maybe first to do join between Table3 and Table1 and then join the result to Table2.

As you are aware, when you have INNER/CROSS joins (NOT LEFT/RIGHT Joins)

in your query, the order in which you write the Joins doesn’t make any difference,

because the query optimizer will decide how to do it.

The SQL Server doesn’t take into account the order of tables that you have

in FROM clause(Inner joins)

Either, rewriting the query to use CTE or inline Subqueries

or to use parentheses probably will not help,

because SQL Server will decide on its own how to get the data.
If you already did all the right work(statistics, indexes) and you can see

a better plan for your query than the SQL Server creates,
The below options can be helpful:

One (not attractive and not recommended) way is to use

OPTION(FORCE ORDER) hint.
One of the drawbacks of OPTION(FORCE ORDER) hint is that

all joins between the listed table will be affected,

Meaning all joins will be performed in the order how they show in the query
And you cannot control just specific 2-3 joins order.

The second option is more delicate.
You can use the TOP clause in your CTE or inline Subquery. This will force the optimizer to do the joins first, in order to satisfy the TOP condition,
Meaning SQL server will have to perform the JOINs first and/or together as a group.

You can specify a larger number like below not to limit the actual number of rows
TOP(2147483647)

The source of this ‘trick’/technique comes from Adam Machanic
https://sqlbits.com/Sessions/Event14/Query_Tuning_Mastery_Clash_of_the_Row_Goals

Wednesday, May 2, 2018

Frequently running stored procedure hangs from time to time

We have encountered situation when we have a frequently

(about 50 times in a second) executed stored procedure

That hangs for couple of seconds and blocks all other

requests with this stored procedure.

We are talking about OLTP system.

In general, this SP runs for 20-30 ms,

but once in 2-3 days, we saw its runtime

increases up to 20-30 seconds

(only single thread/execution) and after that

all comes back to normal.

also when this ‘hanging’ occurs, all other threads that execute this SP wait for the problematic thread/execution to finish.

In the blocking event in the profiler, we saw that the other blocked

Threads that execute the same SP were blocked due to compile lock.

The issue was related to the fact that SQL Server

Before executing a SP, first checks if the Statistics are up to date,

And in case they are not, it will first update the statistics and only then

Continue to execute the SP by re-compiling it and creating fresh execution plan.

The most interesting part is that all other threads with this SP were blocked

Until the process of updating statistics and creating the fresh execution plan finished.

The reason for the blocking of the rest concurrent calls to the same SP

Explained by below:

In Microsoft SQL Server, only one copy of a stored procedure plan is generally in cache at a time.

Enforcing this requires serialization of some parts of the compilation process,

and this synchronization is accomplished in part by using compile locks.

If many connections are concurrently running the same stored procedure and a compile lock must be obtained

for that stored procedure every time that it is run, system process IDs (SPIDs)

might begin to block one another as they each try to obtain an exclusive compile lock on the object.

https://support.microsoft.com/en-us/help/263889/description-of-sql-server-blocking-caused-by-compile-locks

We can control if we want the AUTO_UPDATE_STATISTICS to occur in synchronized way

or in A-Synchronized mode by setting the AUTO_UPDATE_STATISTICS_ASYNC option.

Once we set the AUTO_UPDATE_STATISTICS_ASYNC option to true, the problem was gone.

SELECT name, is_auto_update_stats_on, is_auto_update_stats_async_on

FROM sys.databases

https://docs.microsoft.com/en-us/sql/relational-databases/statistics/statistics

Saturday, August 6, 2011

How to find stored procedure(s) that scan particular index

When it comes to performance issues, we can get a lot from querying
the cached plans of stored procedures, for example:

- we can find which stored procedures(SPs) scan an index
- in which SPs we have bookmark lookups/ missing indexes
- find SPs where different spool operators being used
- get SPs which contain parallel execution
- check if an index being used in some SP(s)

and more.

Let`s start with the basic. By using the query below you can get
an execution plan of a stored procedure, open it and analyze it.

-- Get execution query plan in XML format
SELECT OBJECT_NAME(st.objectid,st.dbid) AS ObjectName,cp.usecounts AS ExecutionCount,
                 qp.query_plan AS QueryPlan
FROM     sys.dm_exec_cached_plans AS cp
                 CROSS APPLY sys.dm_exec_query_plan(cp.plan_handle) AS qp
                 CROSS APPLY sys.dm_exec_sql_text(cp.plan_handle) AS st
WHERE cp.objtype='Proc'
                AND st.dbid=DB_ID(DB_NAME()) -- Current DB
                AND OBJECT_NAME(st.objectid,st.dbid) = 'SPName'

By running the next query we can find out which stored procedures
scan particular index.

WITH XMLNAMESPACES('http://schemas.microsoft.com/sqlserver/2004/07/showplan' AS ns)
SELECT OBJECT_SCHEMA_NAME( qp.objectid ) AS [schema_name],
              OBJECT_NAME(qp.objectid ) AS [object_name],
              qp.query_plan
FROM    sys.dm_exec_cached_plans as p
             CROSS APPLY sys.dm_exec_query_plan( plan_handle ) as qp
             CROSS APPLY qp.query_plan.nodes('//ns:IndexScan/ns:Object[@Database="[DBNAME]"]
                        [@Schema="[dbo]"][@Table="[TblName]"][@Index="[IndexName]"]' ) plan_xml(x)

Important note: The XML is case sensitive, so insert the table/index name
exactly as they defined, pay attention to capital letters.

The following query can help you to check if an index being used,
this can be helpful when you want to delete an index.

WITH XMLNAMESPACES(DEFAULT N'http://schemas.microsoft.com/sqlserver/2004/07/showplan'),

CachedPlansCTE (DatabaseName,SchemaName,TableName,IndexName,QueryText,QueryPlan,[object_name])
AS
(
SELECT ObjNode.op.value(N'@Database',N'varchar(128)') AS DatabaseName,
              ObjNode.op.value(N'@Schema',N'varchar(128)') AS SchemaName,
              ObjNode.op.value(N'@Table',N'varchar(128)') AS TableName,
              ObjNode.op.value(N'@Index',N'varchar(128)') AS IndexName,
              cp.[text] AS QueryText,
              cp.query_plan AS QueryPlan,
              cp.[object_name]
FROM (
           SELECT query.[text],qp.query_plan,OBJECT_NAME(qp.objectid ) as [object_name]
           FROM    sys.dm_exec_cached_plans as p
                        OUTER APPLY sys.dm_exec_sql_text(plan_handle) as query
                        OUTER APPLY sys.dm_exec_query_plan(plan_handle) as qp
           WHERE query.[text] NOT LIKE '%sys%'
                         AND p.cacheobjtype ='Compiled Plan'
                         AND p.objtype='Proc'
) cp
CROSS APPLY cp.query_plan.nodes(N'//Object') ObjNode (op)
)
SELECT [object_name],DatabaseName,SchemaName,TableName,IndexName,QueryPlan,QueryText
FROM    CachedPlansCTE
WHERE (IndexName like '%IndexName%')
              AND (QueryText not like '%insert%')
              AND (QueryText not like '%update%')
OPTION (MAXDOP 1)

Source: http://sqlconcept.com/2011/07/12/how-to-query-the-cached-plans-xml/
http://blogs.sqlserver.org.au/blogs/greg_linwood/archive/2008/04/27/1229.aspx

Wednesday, January 26, 2011

How to use performance counters in sys.dm_os_performance_counters DMV

As you probably already know,
in DMV called 'sys.dm_os_performance_counters' you can find
many performance counters related to the SQL Server.
You can use them instead of using the counters from Windows
Performance Monitor application.

But there is a catch you should be aware of.

There are 3 types of the counters and a counter`s value
calculation depends on the type of the counter.

Counters types are:

Value/Base:
In order to get the right value of this kind of counter you have
to take counter with cntr_type = 537003264 and divide it by
'base' counter with cntr_type = 1073939712.

For example:
("Buffer cache hit ratio" / "Buffer cache hit ratio base") *100
will give you the 'Buffer cache hit ratio'

SELECT object_name,counter_name,cntr_value,cntr_type
FROM sys.dm_os_performance_counters
WHERE (counter_name = 'Buffer cache hit ratio'
OR counter_name = 'Buffer cache hit ratio base')
AND object_name like '%:Buffer Manager%'

Per Second:
These kind of counters store cumulative values, meaning the
value must be compared at 2 diffrerent times by calculating the
differences between the values.For instance,get a counter value,
save it, then get it again after 5 seconds for example and the right
final counter value will be:
(@SecondCounterValue-@FirstCounterValue)/5 seconds.

These counters have cntr_type = 272696576.

SELECT *
FROM sys.dm_os_performance_counters
WHERE counter_name = 'SQL Compilations/sec'
              AND object_name like '%SQL Statistics%'

Point In Time:
These are point-in-time counters.They hold the value at the
current point-in-time.These counters have cntr_type = 65792.

SELECT *
FROM sys.dm_os_performance_counters
WHERE counter_name = 'User Connections'
              AND object_name Like '%General Statistics%'

Example for calculating the Buffer cache hit ratio:

SELECT (a.Val*1.0/b.BaseValue)*100.0 as [BufferCacheHitRatio]
FROM (
SELECT cntr_value as Val
FROM sys.dm_os_performance_counters
WHERE counter_name = 'Buffer cache hit ratio'
               AND object_name like '%Buffer Manager%'
) as a
CROSS JOIN
(
SELECT cntr_value as BaseValue
FROM sys.dm_os_performance_counters
WHERE counter_name = 'Buffer cache hit ratio base'
               AND object_name like '%:Buffer Manager%'
) as b

By knowing the types of the counters, you can write a SP
that will collect for you the performance data of your server.
Here is a very good example of such SP.

Wednesday, September 22, 2010

How to troubleshoot slow running query

So you have a query or
stored procedure that
running slow.What are
the main steps to identify
the reason(s) and how
to change the query?

I assume you already have the query and I am not going to
talk about how to find the slow queries/stored procedures.

I am not going to talk about the CPU/Network/Memory
bottleneck and other fancy words that generally will not help
us with performance of single query or stored procedure.

What I want is to list the main steps I and my DBA team do
when we have a slow running query/SP.

1) Examine the Execution Plan:

Take a look at the query execution plan,

a. Search for Table Scan / Index Scan, these operators
    significant if you access big tables.
    Check if proper indexes already exist on the tables,
    find out why SQL Server chose not to use them,
    for example, usage of local variables, parameters sniffing,
    outdated statistics,heavily fragmented indexes can cause
   to Table/Index scan.

b. Check if you have any key lookups/bookmark lookups.
    In case of large table / big result set this one can lead to
    performance degradation. See if you can create covering index
    or include the needed columns in a non clustered index.

c. See if you have Spool operators in the execution plan.
    These operators are Blocking operators,meaning they need to
    get all the data at once and not row by row. An excellent
    deep dive into Spool operators you can find here.
    Very briefly: A spool reads the data and saves it on TempDB.
This process is used whenever the Optimizer knows that the
    density of the column is high and the intermediate result is very
    complex to calculate.

d. If you see that there is index seek operator, check if the SQL
    Server chose the right index. There are situations when Query
    Optimizer can choose "wrong" index, this can happened when
you have more than one index in the table that SQL Server
    can use to fetch the data.In such cases, check your query filters
in the Where clause.

e. Compare the Actual number of rows versus Estimated.

If you see big difference in these numbers, this is a sign
that you use wrong generated execution plan.
Check statistics, consider to use Option(recompile).

Check in the Execution Plan in XML view what
were the values of input paramters when the SP was
compiled. This can help you to undestand why
SQL Server built the execution plan you see.

f) For every operator in every query, you will see
the operator`s cost. By checking this cost, you
can easily the most expensive operators in the query.

One thing to note here:
   Even in the actual execution plan, you can only see
the estimated costs, and the percentage values represent
the estimated relative costs.
   Therefore, If there is a big difference between the
   estimated and the actual number of rows, there is a good
   chance that the relative costs are misleading.

2) Analyze the output of STATISTICS IO

By setting the SET STATISTICS IO ON,
SQL Server will display information regarding the amount of disk
activity generated by Transact-SQL statements:

Table 'ProductCostHistory'. Scan count 1, logical reads 5, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0,
lob read-ahead reads 0.

In the output you need to check the following:

Scan count - Number of index or table scans(access) performed.
Logical reads - Number of pages read from the data cache.

These counters are very handy when you compare the changes
you made in the query.

Almost in all cases the reason for slowness of a query is the high
and not really needed amount of 'pages read'/ Logical reads.
You need to check if the amount of 'page reads' is really what you
expect to be, if not , here is your main problem.

By looking at these counters you can see how many times SQL
Server accessed a table, how many pages were read.
Search for high number of Logical reads and on each change of
the query compare these values.

3) Try to rewrite the query / stored procedure

Check if you use DISTINCT / UNION,
not SARGable expressions (like != ),functions on columns
in the where clause, scalar functions,wrong Index/Join hints.
Use JOINs rather than subqueries.

Pages