How to perform a large delete operation

Deleting data from your production databases can be tricky. You can either choose TRUNCATE or DELETE statements to suit your need. Here in this blog, I will focus more on how to DELETE data efficiently and will have a bonus section on TRUNCATE.

The two very important factors I consider when deleting records from the database are:

  • Number of rows to be deleted
  • Percentage of rows in the table to be deleted

Consider a table with 100,000,000 records in it.

Case 1: Number of records to be deleted = 10,000

Since the number of records to be deleted is small, it is perfectly fine to use a single DELETE. It should not have much impact on the transaction log (Remember DELETE statement is fully logged).

DELETE FROM dbo.Test 
WHERE Id <= 10000

Case 2: Number of records to be deleted = 10,000,000

In this case, the percentage of rows to be deleted is still on the lower side but the number of rows to be deleted is much higher, We are deleting 10 Million records here. Going with a single delete statement like in Case 1, can cause logs to grow significantly and welcome the lock escalation. To avoid such issues, we can delete the records in chunks. Check this blog from Aaron Bertrand for more.

SET NOCOUNT ON;
DECLARE @Counter INT = 1;
DECLARE @BatchSize INT = 10000; -- Split the large delete to smaller chunks
WHILE @Counter > 0
BEGIN   
     BEGIN TRANSACTION;    
          DELETE TOP (@BatchSize)    
          FROM dbo.Test   
          WHERE Id <= 10000000;    
          SET @Counter= @@ROWCOUNT;   
      COMMIT TRANSACTION;  
END

The added benefit of wrapping each batch size under its own transaction is if you have to stop your delete statement for “reasons”, then previous batches will already be committed and when you start over, you will have lesser records to delete.

Case 3: Number of records to be deleted = 75,000,000

Oh yeah, baby, we are deleting 75 Million records (75% of the total records in the table) and of course a whole lotta data …woo hoo. Let us look into a more optimal approach compared to the above two cases.

/* Using a minimally logged bulk operation, copy all the records (not to be deleted) from the Test table to TestStaging */
SELECT Id, Description, Date 
INTO dbo.TestStaging
FROM dbo.Test
WHERE ID >= 75000000; --Rows to remain the table

/* Drop the original table */
DROP TABLE dbo.Test;

/* Rename the staging table to the original table name */
EXEC sp_rename 'dbo.TestStaging', 'Test';

/* Create constraints, indexes, and permissions on the Test table (Don't forget Triggers if you had one before) */

    Or delete EVERYTHING

    TRUNCATE TABLE dbo.test;

    The truncate statement requires much stronger permission (minimum ALTER permission) but is a very efficient way of deleting all rows from a target table. The truncate statement has significantly less logging compared to the delete statement as it only requires logging extents/pages that were deallocated for rollbacks.

    BEGIN TRANSACTION
        DECLARE @Identity AS INT = IDENT_CURRENT('dbo.Test') + 1;
        TRUNCATE TABLE dbo.Test;
        DBCC CHECKIDENT('dbo.Test', RESEED, @Identity);
    COMMIT TRANSACTION

    Truncate statement resets the IDENTITY property if it exists on the table, unlike delete. If you want to retain the current identity value but clear out the data using Truncate, then make sure you capture the current identity value (plus one) and reseed the identity property after truncating.

    Also, remember you cannot Truncate a table if there are foreign keys pointing to that table or if there are indexed views based on that table.