Quantcast
Viewing all articles
Browse latest Browse all 28

Table Scans and Index Scans affects more than the table they access

SQL Server only queries data in memory (data cache). If the data needed is not cached, SQL Server will retrieve the data from disk and load it to data cache, and then SQL Server will use the data from the cache.

I have a general guide line that Table Scans and Index Scans are bad. This may not be an issue for small tables, but for large tables scans can cause significant performance issues. For example, if a query accesses a table that is 20 GB in size and a scan occurs, then there is a good chance that all data for that entity will be loaded in memory. If this data is not in memory, then SQL must fetch the data from disk and load it into memory. Fetching data from disk is usually an expensive IO process. If there is not enough available space in the data cache, SQL Server will remove (flush) data from the cache to make room for data that was retrieved from disk.

Data that is used often and cached can be removed from the cache due to poor queries or the lack of an index. Here’s a contrived example. We have 2 tables. The 1st table is Orders; contains 10 million records and requires 3 GB of disk space. The Orders table is extremely important and is used in most queries and queries that access this table must return very quickly. The 2nd table TaskLog; contains 200 million records and requires 7 GB of disk space. For simplicity, neither table has any non-cluster indexes.

Let’s presume that the server has 8 GB memory. If all queries are executed on the Orders table, eventually most of the data from the Orders table would be in the data cache. There would be little need for SQL to access the disk. Queries would execute fairly fast.

Now, UserA queries the TaskLog table. The query gets counts of TaskType(see example query below). When the user executes this query a table scan is used. Since the data is not in memory, SQL Server will transfer the data from disk to memory. The problem is that there is not enough memory to contain both the Orders and TaskLog table. Since there’s not enough memory SQL Server will flush Orders data from memory and replace it with data with the TaskLog data.

SELECT Count(TaskType)
FROM TaskLog
GROUP BY TaskLog

Now the issue is that any queries that need to access Ordres will be retrieve from disk. This will incur a penalty in performance.

There are many options to solve this problem; indexes could be created on both the Orders and TaskLog table, more memory could be added, and there are probably other options.

But how do you identify if memory allocation is a problem. Below is a query that retrieves space used by all Cluster Indexes and Non-Cluster Indexes. It will show the size of the entity on disk and how much of the entity is in memory.

SELECT
    PhysicalSize.TableName
   ,PhysicalSize.IndexName
   ,PhysicalSize.Index_MB
   ,BufferSize.Buffer_MB
   ,CASE
       WHEN Index_MB != 0 AND Buffer_MB != 0 THEN
            CAST(Buffer_MB AS Float) / CAST(Index_MB AS Float)
       ELSE 0
    END IndexInBuffer_Percent
FROM
(
    SELECT
        OBJECT_NAME(i.OBJECT_ID) AS TableName,
        i.name AS IndexName,
        i.index_id AS IndexID,
        SUM(a.used_pages) / 128 AS 'Index_MB'
    FROM sys.indexes AS i
    JOIN sys.partitions AS p ON
        p.OBJECT_ID = i.OBJECT_ID
        AND p.index_id = i.index_id
    JOIN sys.allocation_units AS a ON
        a.container_id = p.partition_id
    GROUP BY i.OBJECT_ID,i.index_id,i.name
) PhysicalSize
LEFT JOIN
(
    SELECT
        obj.[name] TableName,
        i.[name] IndexName,
        obj.[index_id] IndexID,
        i.[type_desc],
        count_BIG(*)AS Buffered_Page_Count ,
        count_BIG(*) /128 as Buffer_MB --8192 / (1024 * 1024)
    FROM sys.dm_os_buffer_descriptors AS bd
    INNER JOIN
    (
        SELECT object_name(object_id) AS name
            ,index_id ,allocation_unit_id, object_id
        FROM sys.allocation_units AS au
        INNER JOIN sys.partitions AS p ON
            au.container_id = p.hobt_id
            AND (au.type = 1 OR au.type = 3 OR au.type = 2)
    ) AS obj ON
        bd.allocation_unit_id = obj.allocation_unit_id
    LEFT JOIN sys.indexes i on
        i.object_id = obj.object_id
        AND i.index_id = obj.index_id
    WHERE database_id = db_id()
    GROUP BY obj.name, obj.index_id , i.[name],i.[type_desc]
) BufferSize ON
    PhysicalSize.TableName = BufferSize.TableName
    AND PhysicalSize.IndexID = BufferSize.IndexID
ORDER BY Buffer_MB DESC

Here sample result from the query (names have been changed to protect the innocent)

Table Name Index Name Index MB Buffer MB Index In Buffer Percent
Table1 PK_Table1 211875 20586 10%
Table2 PK_Table2 3711 3348 90%
Table3 PK_Table3 27689 2246 8%
Table4 IX_Table4_A 52181 1675 3%
Table5 PK_Table5 278409 1436 1%
Table4 IX_Table4_B 28585 1418 5%
Table2 IX_Table2_A 725 745 103%
Table6 PK_Table6 572 572 100%
Table3 IX_Table3_A 15701 493 3%
Table3 IX_Table3_B 17756 467 3%
Table7 PK_Table7 461 461 100%

Table2 is equivalent to our Orders table in the example. It’s very important that results from this table are returned fairly fast. As we can see 90% of data for PK_Table2 is stored in memory; this is good.

PK_Table1 is 211 GB and 20 GB are in memory. For this example speed in retrieving data from this table isn’t that important and 20GB in memory seems too much. This could be an indication that a scan is being used to access this data, or someone is running a query that they shouldn’t. This provides me some good information to further my investigation.

Having one bad query can affect not just the performance of 1 table but the performance of the system as a whole.


Image may be NSFW.
Clik here to view.
Image may be NSFW.
Clik here to view.

Viewing all articles
Browse latest Browse all 28

Trending Articles