SQL Server only queries data in memory (data cache). If the data needed is not cached, SQL Server will retrieve the data from disk and load it to data cache, and then SQL Server will use the data from the cache.
I have a general guide line that Table Scans and Index Scans are bad. This may not be an issue for small tables, but for large tables scans can cause significant performance issues. For example, if a query accesses a table that is 20 GB in size and a scan occurs, then there is a good chance that all data for that entity will be loaded in memory. If this data is not in memory, then SQL must fetch the data from disk and load it into memory. Fetching data from disk is usually an expensive IO process. If there is not enough available space in the data cache, SQL Server will remove (flush) data from the cache to make room for data that was retrieved from disk.
Data that is used often and cached can be removed from the cache due to poor queries or the lack of an index. Here’s a contrived example. We have 2 tables. The 1st table is Orders; contains 10 million records and requires 3 GB of disk space. The Orders table is extremely important and is used in most queries and queries that access this table must return very quickly. The 2nd table TaskLog; contains 200 million records and requires 7 GB of disk space. For simplicity, neither table has any non-cluster indexes.
Let’s presume that the server has 8 GB memory. If all queries are executed on the Orders table, eventually most of the data from the Orders table would be in the data cache. There would be little need for SQL to access the disk. Queries would execute fairly fast.
Now, UserA queries the TaskLog table. The query gets counts of TaskType(see example query below). When the user executes this query a table scan is used. Since the data is not in memory, SQL Server will transfer the data from disk to memory. The problem is that there is not enough memory to contain both the Orders and TaskLog table. Since there’s not enough memory SQL Server will flush Orders data from memory and replace it with data with the TaskLog data.
SELECT Count(TaskType) FROM TaskLog GROUP BY TaskLog
Now the issue is that any queries that need to access Ordres will be retrieve from disk. This will incur a penalty in performance.
There are many options to solve this problem; indexes could be created on both the Orders and TaskLog table, more memory could be added, and there are probably other options.
But how do you identify if memory allocation is a problem. Below is a query that retrieves space used by all Cluster Indexes and Non-Cluster Indexes. It will show the size of the entity on disk and how much of the entity is in memory.
SELECT PhysicalSize.TableName ,PhysicalSize.IndexName ,PhysicalSize.Index_MB ,BufferSize.Buffer_MB ,CASE WHEN Index_MB != 0 AND Buffer_MB != 0 THEN CAST(Buffer_MB AS Float) / CAST(Index_MB AS Float) ELSE 0 END IndexInBuffer_Percent FROM ( SELECT OBJECT_NAME(i.OBJECT_ID) AS TableName, i.name AS IndexName, i.index_id AS IndexID, SUM(a.used_pages) / 128 AS 'Index_MB' FROM sys.indexes AS i JOIN sys.partitions AS p ON p.OBJECT_ID = i.OBJECT_ID AND p.index_id = i.index_id JOIN sys.allocation_units AS a ON a.container_id = p.partition_id GROUP BY i.OBJECT_ID,i.index_id,i.name ) PhysicalSize LEFT JOIN ( SELECT obj.[name] TableName, i.[name] IndexName, obj.[index_id] IndexID, i.[type_desc], count_BIG(*)AS Buffered_Page_Count , count_BIG(*) /128 as Buffer_MB --8192 / (1024 * 1024) FROM sys.dm_os_buffer_descriptors AS bd INNER JOIN ( SELECT object_name(object_id) AS name ,index_id ,allocation_unit_id, object_id FROM sys.allocation_units AS au INNER JOIN sys.partitions AS p ON au.container_id = p.hobt_id AND (au.type = 1 OR au.type = 3 OR au.type = 2) ) AS obj ON bd.allocation_unit_id = obj.allocation_unit_id LEFT JOIN sys.indexes i on i.object_id = obj.object_id AND i.index_id = obj.index_id WHERE database_id = db_id() GROUP BY obj.name, obj.index_id , i.[name],i.[type_desc] ) BufferSize ON PhysicalSize.TableName = BufferSize.TableName AND PhysicalSize.IndexID = BufferSize.IndexID ORDER BY Buffer_MB DESC
Here sample result from the query (names have been changed to protect the innocent)
Table Name | Index Name | Index MB | Buffer MB | Index In Buffer Percent |
Table1 | PK_Table1 | 211875 | 20586 | 10% |
Table2 | PK_Table2 | 3711 | 3348 | 90% |
Table3 | PK_Table3 | 27689 | 2246 | 8% |
Table4 | IX_Table4_A | 52181 | 1675 | 3% |
Table5 | PK_Table5 | 278409 | 1436 | 1% |
Table4 | IX_Table4_B | 28585 | 1418 | 5% |
Table2 | IX_Table2_A | 725 | 745 | 103% |
Table6 | PK_Table6 | 572 | 572 | 100% |
Table3 | IX_Table3_A | 15701 | 493 | 3% |
Table3 | IX_Table3_B | 17756 | 467 | 3% |
Table7 | PK_Table7 | 461 | 461 | 100% |
Table2 is equivalent to our Orders table in the example. It’s very important that results from this table are returned fairly fast. As we can see 90% of data for PK_Table2 is stored in memory; this is good.
PK_Table1 is 211 GB and 20 GB are in memory. For this example speed in retrieving data from this table isn’t that important and 20GB in memory seems too much. This could be an indication that a scan is being used to access this data, or someone is running a query that they shouldn’t. This provides me some good information to further my investigation.
Having one bad query can affect not just the performance of 1 table but the performance of the system as a whole.
Image may be NSFW.
Clik here to view.
Clik here to view.
