SQL Server Integration Services helps you move data from application databases to a reporting warehouse or just move data from point A to point B for other business reasons. Different elements are part of such data movement such as importing data from disparate sources, cleanse and transform the data before storing them in the destination server.
With a good SSIS package design, data flow can be painless. Let’s focus on the data flow engine and how we can fine-tune it for better performance.
Data Flow Buffers
When data is extracted from a source, SSIS maps all those column data types to their local data type. SSIS engine then allocates memory buffers to store incoming data from the source on which transformations can be applied. Better use or reuse of such buffers by the transformations used can elevate the SSIS performance.
Data Flow Transformation
It gives you the ability to transform incoming data as needed. There is a range of transformations that can be done but such tasks can block the operation. Types of blocking in data flow transformations:
- Non-blocking – No waits while each row goes through a transformation. Transformations like Derived Column, Lookup, Multicast
- Partial blocking – Some wait until sufficient rows are stored. Transformations like Merge, Union All
- Blocking – Wait until all rows are read. Transformations like Aggregate, Sort, Fuzzy Lookup
Synchronous components are generally faster compared to asynchronous components because of the reuse of the buffers.
There is a couple of important data flow optimization that can help with buffer estimates. Configure DefaultMaxBufferRows and DefaultMaxBufferSize to get a bigger buffer but always test the values before pushing to production.
DefaultMaxBufferSize – It stores the maximum buffer size which is 100MB. The default value is 10MB.
DefaultMaxBufferRows – The default is set to 10,000 records.
Note: Make sure the value for DefaultMaxBufferRows (including the size of the total rows) should be approximately the same as DefaultMaxBufferSize. You can enable logging on the data flow task and see the number of records each buffer hold using the BufferSizeTuning event.
At times another property AutoAdjustBufferSize can do the trick for you instead of manually updating to find a good buffer size. When this property is enabled, the size of data flow buffers will automatically change to match the records configured with DefaultBufferMaxRows.
Once you understand buffers and transformation, try to avoid asynchronous transformations and leverage the underlying database engine to do the heavy lifting if it helps (like sorting). Never pull more data than you need, not just the rows but also never used columns. To enhance data read performance, instead of using a dropdown to select a source table write a query with filters and sorting to get what you need. The more records you can fit in a single buffer, the better it is so try to keep the data types of the columns as small as possible.
Granted you have multiple processors, use parallel execution. Use control flow to benefit SSIS’s highly parallel engine.