Tag Archives: Automation

Plan Cache Pollution

I was working on a pretty common and simple problem the other day. I’m sure you’ve all seen servers where there are plenty of single use plans in the plan cache. All those single use plans are just using up memory for no real gain. One major point to bear in mind with all this is that SQL Server is basically an in memory system. SQL Server doesn’t directly update the data on disk. The data is loaded into memory, updated in memory and then eventually flushed to disk. This means that anything that consumes memory is taking away resources that may, or may not, be better utilized. Lets take a closer look, albeit still an overview at what happens with a read/write, and then solve our plan cache bloat problem.

Review of how SQL Server reads/updates data

As I already stated SQL Server will not update data directly on disk. Instead SQL Server will load the data from disk into memory. During ramp up (After a reboot or instance restart and before target memory is reached) the I/O is at the extent level (8 contiguous pages on disk) although can be more with read ahead during scans. Once that SQL Server has achieved steady state the I/O is at the “Right” size. What does right mean? Well if SQL Server needs an extent or more it will read that extent or read ahead for more data during that I/O. However if SQL Server needs a single 8KB page then the I/O operation is an 8KB page. This read is represented internally as a PAGEIOLATCH_XX (Where XX is usually EX or SH for write or read respectively) if you look in sys.dm_os_wait_stats. You can use the below query to see all in memory page latches: –

[code language=”sql”]
SELECT *
FROM sys.dm_os_wait_stats
WHERE wait_type LIKE N’pagelatch%’; /* Change to pageiolatch% to see the input/output latches which are similar except for I/O */
[/code]

With our data in memory, lets assume a single 8KB data page, SQL Server will now take the appropriate latch. Most likely either a PAGELATCH_SH (Read) or a PAGELATCH_EX (Write). We’ll ignore transactional locks and all other synchronisation mechanisms, they’re well covered elsewhere and not what I wanted to cover, also this is a simplification for this article. If we’re reading from the page SQL Server will now process that operation, likewise if writing or updating a record SQL Server has the opportunity to do so. This operation is now entirely in memory.

In the case of a write SQL Server will also generate a log record which will be flushed/hardened to disk on commit (Well that’s a good enough explanation for this post but isn’t always entirely accurate). The record would also be hardened to disk on rollback however another compensation log record would also be created and hardened to disk, so we would write 2 log records in case of a rollback. After, and only after, the log record of the changes have been flushed to disk can the data page be written out to disk. It is also important to appreciate that the write of the data page may not happen right away.

So when is a data page written to disk? There are several internal operations that can flush a data page to disk, sorry if a I miss something out here but they are basically as below (For checkpoint I have listed some operations that cause checkpoints): –

  • Checkpoint – this writes all dirty pages (Pages in memory that have data changes not yet reflected on disk) for a database to disk
    • Automatically based on the server or database recovery interval
    • Adding or removing a data or log file to/from a database
    • Any kind of backup operation
    • Any kind of clean shutdown of a database
    • Creating a database snapshot
    • Anything that causes a database snapshot to be created under the covers (eg, DBCC CHECKDB)
    • Putting a database into read only mode
    • Manual checkpoints
  • Lazywriter – this process tries to keep some free buffers (Think of these as memory slots for pages) in the buffer pool and will flush the least recently used pages to disk as needed
  • Eager writing – this process goes to work during operations like bulk loads and will flush those pages that are written to disk very quickly

So you can appreciate that allocating memory to execution plans that won’t be used more than once wastes memory that could be used for data. That is until SQL Server ages those plans out automatically. Also more data pages in the buffer pool means less physical I/O and therefore shorter waits. This generally leads to better performance. I say generally because I don’t want you to run off and flush your plan cache, that would be a very bad idea.

The solution

Now we can get back to the meat of this post. If you run the below query, which might be pretty intensive depending on your server spec so test first, you’ll see how many single use plans you have cached. These are execution plans that SQL Server needed to generate to execute a query but have not been used since that one time. I have grouped the results for single use plans into plans created in the last hour and older:-

[code language=”sql”]
SELECT SUM(CASE WHEN execution_count = 1 AND creation_time > DATEADD(HOUR, -1, GETDATE()) THEN 1 ELSE 0 END) AS single_use_plan_count_last_hour,
SUM(CASE WHEN execution_count = 1 AND creation_time < DATEADD(HOUR, -1, GETDATE()) THEN 1 ELSE 0 END) AS single_use_plan_count_older,
SUM(CASE WHEN execution_count > 1 THEN 1 ELSE 0 END) AS multi_use_plan_count
FROM sys.dm_exec_query_stats
OPTION(RECOMPILE); /* don’t add more bloat to the plan cache */
[/code]

The below query will show you how much memory is being consumed by query plans, again broken down into single use plans, multi use plans and total size: –

[code language=”sql”]
SELECT CAST(SUM(CASE WHEN usecounts = 1 THEN size_in_bytes / 1024.0 / 1024 ELSE 0 END) AS NUMERIC(19,2)) AS single_use_plans_mb,
CAST(SUM(CASE WHEN usecounts <> 1 THEN size_in_bytes / 1024.0 / 1024 ELSE 0 END) AS NUMERIC(19,2)) AS other_plans_mb,
CAST(SUM(size_in_bytes / 1024.0 / 1024) AS NUMERIC(19,2)) AS total_plan_cache_mb
FROM sys.dm_exec_cached_plans
OPTION(RECOMPILE);
[/code]

There are a couple of solutions if you have a lot of single use plans in the plan cache. The first and most obvious is to set the “optimize for ad hoc workloads” flag. This can be found on the instance properties under the advanced tab in SSMS. I consider this to be a standard setting for most installs that I do. However sometimes this cannot be set for whatever reason and you have a rather large plan cache full of single use plans. In this case there are other options.

You’ll need to baseline your systems to decide on what normal levels are for your system. However using these queries should give you some insight into your systems plan cache. If you find that you are using a lot of memory on old single use plans then you might want to flush those. The chances are that they will be aged out before they are used again. If you suspect that the plans older than a certain age won’t be re-used before being aged out then the query at the end of this post can help you remove them more aggressively. The query gets all the single use plans older than a certain age and then loops through removing them. One small point that I cannot define for you is “What is a lot of memory allocated to single use plans?” Unfortunately that depends on your system and I cannot answer that. It depends on your workload and the amount of memory that your system has.

In the query below you need to adjust the waitfor delay at the bottom so that you don’t cause contention on the plan cache and the age of the plans that you want to remove. Otherwise you should be able to run it manually or from an agent job. I would personally recommend creating a procedure for it with the parameters you want but I don’t want to spoil the fun of reading it to understand the script before running it.

Just as an example for my Microsoft Surface the query made the below difference, the instance hadn’t been up for more than a few hours: –

Runsingle_use_plan_count_last_hoursingle_use_plan_count_oldermulti_use_plan_count
Before110158
After3058
Runsingle_use_plans_mbother_plans_mbtotal_plan_cache_mb
Before14.8916.7031.59
After0.4615.6916.15

Drop_ single_use_plans