Typically Integration Services come with a rich set of Tasks, Containers, Transformation Functions and Data Adapters for support and maintenance of business applications. SSIS provides similar services which have evolved over the years from a simple DTS Executable utility to a full-fledged Integration Service offered with the SQL Server Database Platform. SSIS can help in solving complex business problems using ETL and BI and also manage SQL Server Databases and other SQL Server objects.
Business Scenarios for SSIS Usage
SSIS is used in a variety of Data Transformation, Migration and Warehousing scenarios. Few Typical business scenarios are:
· Copying subset of data from a Large Volume Transactional Database to a Reporting Database to generate various Operational and Analytical Data Reports
· Migration and Replication of Data from a Development or Test Database Server to Production Servers and vice-versa.
· Merging Data from various heterogeneous Data Sources to s Single Destination Database Store for further usage or analysis. Further usage of such data will include using them as archive information for generating trend analysis reports and standardizing the data formats for future transactions of an application.
· Aggregating data on a periodic basis using batch processing for Reporting application for faster report data processing.
· SSIS also helps in cleaning and standardization of data which comes from various data sources before loading to destination database.
· It also helps Database administrators in automating the administrative functions like backup and restore. They can be scheduled over SQL Server Agent jobs which help in scheduling memory intensive jobs at non-peak hours and hence prevent interference to regular transaction performance especially in cases of OLTP and OLAP Applications.
In summary SSIS helps in providing a Business Intelligence Tool for data transformation processes. From a simple data extraction, load and transform utility SSIS has evolved into a more flexible utility which is easier to debug since it has a graphical user interface and can perform tasks ranging from execution of workflows, data transformations to aggregate, merge and copy data and administrative tasks like data backup and restore. Also it has an application programming interface for programming the integration services tasks using the SSIS object model.
DTS to SSIS
DTS was Microsoft's first ETL utility which was introduced with SQL Server 7.0. It helped improve its Business Intelligence Capabilities. However with the advent of SSIS, Microsoft introduced a new processing engine which provides in-memory buffers, data extraction capabilities and data transformation capability to modify data and make it available to other processes. It enhances the DTS capabilities in the following aspects:
· Faster data processing and high volume data processing
· SSIS provides enterprise level capabilities from simple import/export functionality to complex data transformations which can utilized in large scale data warehousing applications
· It is more easily customizable and scalable as it leverages the .NET framework capabilities to build custom components if not provided out of the box, unlike DTS which did not have a common framework.
· It provides a robust mechanism for iterative processing by using For Loop and ForEach containers not present in DTS
SSIS Performance Issues and Optimization
Since SSIS executes on large chunks of data it comes with a set of performance concerns like memory utilization during SSIS package executions, CPU load balancing, I/O processing speed and network utilization. However as per the SQL Performance team, SSIS can load 1.18 TB of data in 30 minutes and can process 4.5 million sales transaction rows per second.
Some of the design practices to ensure that SSIS packages perform up to expected levels are:
· Ensure that all transformations occur in memory
· Perform capacity planning by understanding resource utilization
· Optimize the SQL Server data sources, destinations and transformation lookups using optimized SQL queries and stored procedures, indexed tables and optimized data transformation methods.
· Logging if done in an SSIS package should be minimal to minimize memory usage.
There are various methods to troubleshoot and debug low memory condition in SSIS. Few methods can be:
· Execute SSIS on a separate computer that is not executing an SQL Server instance.
· During package execution set the Maximum server memory to a small value to increase available memory.
· Wherever applicable execute SSIS tasks in sequential manner rather than parallel to handle low memory conditions.
· Tweak the values of DefaultMaxBufferRows and DefaultMaxBufferSize to get as many records as possible.
· Avoid unnecessary column usage in Dataflow tasks and configure data types correctly to reduce estimated size.
· Partially blocking transformations like Merge, Merge Join and Union All and Blocking transformations such as Sort and Aggregate consume the maximum amount of memory since a separate buffer is created for their outputs and new thread is introduced in the dataflow. These transformations are asynchronous in nature and should be used carefully to avoid low memory conditions.
· Key tools and techniques for debugging SSIS issues are SSIS logging, SSIS performance counter monitoring and SQL Server Profiler.
SSIS Improvements and Capabilities in SQL Server 2012
With the release of SQL Server 2012, further improvements have been made in troubleshooting and logging features of SSIS like capturing data flow component timing information and row counts for all paths within a data flow. Also data taps can be added to a data flow path to capture the data in CSV format during package execution which aids in troubleshooting data issues.
Also SSIS is a useful utility to migrate data from an on premise SQL Server to SQL Azure. There is however a limitation to this since only data can be moved and not database objects like tables, stored procedures or triggers. The reason for this is that SSIS uses SMO (SQL Server Management Objects) for moving objects and as of now SMO doesn't support SQL Azure. For this reason the database objects like tables and stored procedure need to be moved using Generate and Publish Scripts wizard and then followed by using SSIS for data transformation and migration. There are other alternatives of migrating data to SQL Azure like using the BCP utility to bulk copy data from source to destination (SQL Azure). However BCP lacks SSIS's ability to convert data from one data type to another and also SSIS workflow components.
As of now Microsoft has not come out with a clear strategy of supporting all SSIS Tasks on SQL Azure. During one of the Microsoft PDC (Professional Developers Conference) on SQL Server Services, ETL in the cloud was mentioned but there was no specific mention of SSIS. Many blog writers have questioned the need of ETL in the cloud which makes the future of SSIS in the cloud bleak.
Detailed Reading and further references on various aspects of SSIS are available at MSDN, Technet, Microsoft Support, MSSQL Tips and SQLServerPedia.