Tag: Deleting Data

  • Applying Data Retention to a SQL Table

    Applying Data Retention to a SQL Table


    There are many reasons why you’d need to delete old data from a table within a database, some of which include;

    # GDPR (adhering to data retention policies)
    Ever increasing tables becoming tougher to manage (e.g. a replication refresh is taking 2 days)
    # SQL Server Express limitations (10GB max database size)
    # The table is storing application errors & logs and you’ve been told it’s there to stay

    Maybe it’s a one-off job for you, and the deletions can be done within a days work. Or, you might find yourself dealing with very large tables, deleting daily during a small off-peak window over the course of however long… this is a task that can come along with many variations, and if asked the question, how would you do it yourself? Well, “it depends!”.

    I’ve run through a simple version of the task below, which also summarises a bunch of my other recent tech posts too.  All links below are internal.


    Deleting Data in SQL Server

    1. Get a hold of a test database and restore it on your SQL Server.

    2. Look for one of the larger tables and pick your fancy.

    3. Get row counts by date. For this example, I’m storing the row count information into a temp table. This deletion won’t take long so my session will still be active.

    4. Perform delete in batches.

    5. Get new row counts and compare against previous results.

    Plus a few additional checks to verify.

    Looking good – All data on orders beyond 7 years (1 January 2012 00:00) has been deleted.

    When the initial task is done and all tables are meeting retention periods, we’d then need to consider things like scheduled SQL Agent jobs to maintain the tables every month.


  • Batch Deletions in SQL Server

    Batch Deletions in SQL Server


    This script will delete rows within a table that are out-with the specified retention period. This can be useful if you’re running deletes on large tables, and particularly if;

    # Deletes are taking a long time & you’re finding it difficult to track progress.
    # You have a short maintenance window each day to perform such tasks, which could be due to caution over blocking/deadlocks.
    # Your server is not equipped to handle the log file bloat associated with running the delete as a single transaction.

    Whatever the reason is, this is a useful script to have in the toolbelt.

    I’ve set this to delete 10k rows at a time, although you can find your own sweet spot.

    To find out how long it’ll take until all rows have been deleted, you could;
    1. Get row counts.
    2. Run deletion script for 10 minutes.
    3. Check how many rows were affected.
    4. Refresh Disk Usage by Top Tables report / get new counts.
    5. Compare numbers above.
    6. If the math works out to be 9 months before it’ll finish, then get an index added.