Wednesday, May 30, 2012

SQL Server has encountered occurrence(s) of I/O requests taking longer than 15 seconds to complete

This morning, while going through my regular SQL Server Logs reports, for one of our production server, I found a different error.
SQL Server has encountered 52 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [E:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\DATA\TempDB.mdf] in database [TempDB] (2). The OS file handle is 0x00000884. The offset of the latest long I/O is: 0x00000457490000
First thing that I searched about this error was that is this a critical message?
And answer I found was YES. Basically when talk about I/O in SQL Server, we always have measurements of mille seconds in our mind and waits of several seconds is considered too odd. SQL Server I/O wait time can be examined by following query:
FROM    sys.dm_os_wait_stats

How to check you hard drive performance?
To check, server IO subsystems I trust on Performance Monitor IO Counter PhysicalDisk Object: Avg. Disk Queue Length. Monitor this counter for at least 10 minutes. If the Avg. Disk Queue Length exceeds 2 for next ten minutes for each individual disk drive in an array, then it is sure that you have IO bottleneck.

Who is the culprit, SQL Server or Operating System?
Problem is only your SAN or Local disk IO subsystem. In my case, I found that few other applications were also installed by client on same drive and which were pushing SQL Server to wait for too long to complete its IO requests.

Monday, May 28, 2012

SQL Server: Too High Difference in CPU and Elapsed Time (Duration) , Don’t Blame IO Every Time

There could be different causes behind too high difference in CPU and Elapsed Time (Duration) value for a query executed by SQL Server.  One of the most common reasons is IO problem.  This can easily be observed by executing following query:
FROM    sys.dm_os_wait_stats
If number of waits and average wait time is too high then there is something wrong on IO side. To get the root cause, you have to check different things like queries without proper indexes (high page read by queries), pressure on TempDB side, hard drive and memory performance etc.
Normally when you are facing up to two times higher elapsed time value as compared to CPU then IO waits could be a cause, but what if your query elapsed time is 5, 10 or more times high. 
Recently on one of our production server, a query gave me amazing time stats.

I know it’s a simple query, indexes are properly applied and it return results with fast response. Query executed under 1 sec of response time (SSMS properties can show up to seconds), but Time Stats showing that query elapsed time is more than 6 Seconds.
Why this happening:
This happened because my production server CPUs counter are not synchronized with each other. It could be confirmed from SQL Server Log. (SQL Server 2005 Service Pack 2 and higher edition show this message)
The time stamp counter of CPU on scheduler id 13 is not synchronized with other CPUs.
Good thing is that, there is nothing wrong with your SQL Server performance and everything will work fine. Only problem you can face is that, your performance tuning process will be affected as you can’t collect correct information regarding query execution time.
This basically happens when you make changes in power polices or install utilities that can affect CPU performance and can try to resolved it by setting you machine, power options to “Always On” or “Max Performance”. If it doesn't work for your, then better try to install Service Pack 3 for SQL Server 2005.

Read More about SQL Server Log Errors/Messages