This is a post on how to enable Change Data Capture (CDC) in SQL Server. Enabling this on a database and table will allow the recording of activity when tables and rows have been modified. This is a feature available for us to configure within SQL Server and Azure SQL out of the box.
CDC uses the SQL Agent to record inserts, updates and deletes that happen on a table. Enabling this might be useful to help give a business insight on changes to Personal Identifiable Information (PII) over time for GDPR requirements. Or, another example could be real-time tracking of orders for pricing updates to a product.
For more examples and information on what CDC is, we can refer to MS Docs : What is change data capture (CDC)?
If you are considering enabling this feature, as always we should test and read through the software documentation to ensure all potential impacts and limitations have been reviewed. Some main considerations on the systems side with this one include the additional disk space your CDC tables will need, and performance depending on the number of tables/changes happening.
Before enabling CDC on a database we want to first check if it’s already enabled, and also check if the database owner is set to something other than ‘sa’, else you’ll receive an error message when enabling.
In the demo TSQL below we’re checking the sys.databases system table is_cdc_enabled column, and we have to change the DemoBlog database owner :
-- Check if Database is has CDC Enabled SELECT is_cdc_enabled, * FROM sys.databases; -- (IF_REQUIRED) Change Database Owner to 'sa' -- Error if <> 'sa' = "Could not update the metadata that indicates database demoBlog is enabled for Change Data Capture" ALTER AUTHORIZATION ON DATABASE::[demoBlog] TO [sa] GO
Enabling CDC for a SQL Server database is done using the sys.sp_cdc_enable_db system stored procedure, as shown in this example :
USE [demoBlog]; GO -- Enable CDC for Database ms sql EXEC sys.sp_cdc_enable_db GO /* New system tables > [cdc].[captured_columns]; [cdc].[change_tables]; [cdc].[ddl_history]; [cdc].[index_columns]; [cdc].[lsn_time_mapping]; [dbo].[systranschemas]; */
We can see our new CDC system tables in the screenshot above, and I’ve re-run the sys.databases query to show the changes there too.
I’ve just enabled CDC on this demo database, so I know there are no tables already enabled. We can check the is_tracked_by_cdc column in the sys.tables system table, similar to the check we did before enabling it on the database.
To control the placement of change table data, we can create Filegroups ahead of enabling CDC on tables. Microsoft recommends we do this so that our change tables are separate from source tables.
-- Create FileGroup for CDC ALTER DATABASE [demoBlog] ADD FILEGROUP [DemoBlog_CT]; ALTER DATABASE [demoBlog] ADD FILE ( NAME = demoBlog_data1, FILENAME = 'D:\MSSQL_DATA\demoBlog_data1.ndf', SIZE = 500MB, FILEGROWTH = 50MB ) TO FILEGROUP [DemoBlog_CT]; SELECT * FROM sys.filegroups; SELECT * FROM sys.database_files;
We use sys.sp_cdc_enable_table to enable CDC on a table, as per the example below. When a table is enabled for CDC, a record will be logged in the transaction log for each DML operation applied to the table from this point onwards.
USE [demoBlog]; GO -- Enable CDC on Table EXEC sys.sp_cdc_enable_table @source_schema = N'dbo', @source_name = N'posts', @role_name = NULL, @filegroup_name = N'DemoBlog_CT' GO -- Job 'cdc.demoBlog_capture' started successfully. -- Job 'cdc.demoBlog_cleanup' started successfully.
Our new SQL Agent jobs for the CDC Collection are as per the screenshot below.
Interesting note, “If both CDC and replication are enabled for a database, the log reader handles the population of the CDC change tables.” – Microsoft Docs
For demo purposes, here is an example insert of a row into our table, and then a query of CDC data.
My next post covers Disabling CDC, which may be useful if you receive errors along the way during testing.