I recently wanted to know how much of my database a differential backup would actually need to read and backup. I soon had an idea to scan the Differential Change Map (DCM) pages in the data files. To understand what these pages track lets have a quick review of how SQL Server stores data.
How does SQL Server store data?
SQL Server stores data for each entity in database pages, pages are 8KB chunks of data in the data files. These pages are organised into 64KB extents, or 8 contiguous pages. SQL Server also stores allocation pages that track the usage of pages/extents within the data files. One of these allocation pages is the DCM which tracks whether or not any part of any page in an extent has been modified since the last full backup. So if I had a data page on the 30th page in my database and this was amended since the last full backup then the relevant DCM page would show that the 4th extent had been modified since the last full backup.
The DCM pages track changes to extents since the last full backup. This allows SQL Server to determine, very efficiently, which extents need to be backed up in the next differential backup. The first DCM page is the 7th page (zero based so page 6) in the database. The below command can be used to view the page: –
DBCC PAGE( myDatabase, 1, 6, 3 ) WITH TABLERESULTS, NO_INFOMSGS;
Also DCM pages are repeated every 511,232 pages in the database. Given this information we can use the below pseudo code as a basis for developing our script: –
- Get a list of all the data files (Exclude log files) in our database
- Iterate over all the data files to load all the DCM pages
- Follow all the DCM pages in each file
- Read all the data in those DCM pages
- Save the relevant data into a temp table
- Using the result table work out how many pages in each DCM interval, the section of a data file that the DCM relates to, will be backed up
- Sum the results
- Generate a summary result set
I have a sample output in a couple of tables below as a summary of a scan against a small test database. I have previously run a very similar script successfully against a multi-terabyte database. Every DCM page represents just under 4GB of data in your database so even at 4TB you would have only about 1,000 DCM pages to read. There is obviously a small amount of overhead for the temp tables and to get the file list. Thereafter for every DCM page there will be 14 page reads. So as you can see this is a pretty light-weight script and highly efficient.
Summary result set
Detailed output, this shows one result line per contiguous set of extents where at least one record in that extent has been modified
In order to use this script please change the database name on line 60. Then run the script. I would recommend running against a test or dev instance initially. Lastly in preference to running against a production instance I personally run this against a log-shipped database. Please let me know if this script proves useful or if you have any issues.