Infosys’ blog on industry solutions, trends, business process transformation and global implementation in Oracle.

« Why Lean may need ERP? | Main | Shared Service Process in Payable- Road Towards continuous Improvements »

To ETL or to EL-T: that is the question

Between ETL and EL-T apart from the superficial difference in the positions of the letters L and T there lies a deeper difference in the very philosophy of how data integration is performed in each of these patterns. Before we get into the technical differences and relative benefits let me first lay down the two philosophies of data integration
While ETL stands for “Extraction-Transformation-Load” the core or “most” of the Transformations are performed on the ETL/Data Integrator (DI) engine with extraction and load routines being run on the source and target database engines. On the other hand an EL-T philosophy of data integration completely utilizes the target database to perform the Transformations. It applies what is popularly known as “pushdown” to the database engine.
As we all know “Transformation” is where most of the resources and time are expended in an ETL work and hence a lot depends on where transformation is done, how effectively it is being done and whether the right optimizations are being applied while carrying out the transformation operation. Working wisdom dictates that it makes more sense to run the complex and resource consuming transformations on the data base engine as compared to running them on the ETL engine or in other words “push down” operations make sense when the operations are complex or the data volumes are large. I would believe that  most ETL developers or designers would have had some experience in their life time where they would been required /forced to create a PL/SQL, T-SQL package or stored procedure and call it from the ETL workflow to work around some complex or volume intensive operations. So theoretically one can use an ETL tool and pad it with native data base routines to get around the volume or complexity issue. However this may lead to a very disruptive architecture and a complex data model in trying to make the data transformed by the ETL engine made available to the packages and routines to carry out further transformations. This shortcoming could be replaced by adopting a EL-T philosophy to handle the complete data integration issue and provide all the transformation operations on the native database.
So how are the market leaders in Data Integration space oriented vis-à-vis these two different integration patterns?
Most of the best of breed tools like Informatica, IBM DataSatge, BO-Data Integrator, Ab-Initio offer the ETL philosophy of data integration. Although they claim to be pure play ETL players but they always used some to large amount of “pushdown” optimization techniques. For example BO-DI optimizes its aggregation, joins and sort transformations by actually pushing them down to the database engine and making them work faster.
On the other end of the spectrum is Oracle with its Oracle Warehouse Builder (OWB) which essentially transformed all the mappings created into PL/SQL packages and executed them on the Oracle RDBMS which I believe is very much an EL-T philosophy with all its transformation being done on the DBMS. As an icing on the cake Oracle acquired (which I believe may have been deliberate) Sunopsis in 2006 which was  then probably the only other best of breed EL-T tool available in the market and now with these two being integrated in the near future I would expect to see a real enriched EL-T tool in the making. Also an interesting pattern that we can observe is with Informatica. Informatica Power Centre from Version 8.0 onwards has come up with an option of using either ETL or EL-T or even a hybrid option. I would want to believe that this is testimony to the fact that erstwhile pure play ETL players are seeing merit in adopting an EL-T philosophy and offering the same to its end users.


Good article! I agree that there are times where complex data crunching work is pushed down to the DB engine. I believe that is because the developer wants control over how the logic is optimized based on the understanding of the underlying data model and data being crunched. This also requires detailed understanding of how the database engine processes data. I am not sure how an EL-T engine like Oracle's ODI has the capability of creating optimized DB specific code (like Oracle/DB2/Teradata & so on). Also, I am not sure how good an ELT work in an appliance scenario like Netezza which provides limited handle to users to tweak how data can be crunched. Till I understand that fully (and perhaps the market), I will continue to favor a hybrid approach.

Good article about the two data integration patterns! My take is that we have to treat them as patterns - which means they are solutions for problems arising in specific situations. ETL works best when we need the Transformations(server) to scale outside the source or target database servers. Pure-play ETLs can be maintained outside the databases and scaled based on needs. Also from a developer perspective it provides maintainability of code and shorter learning curve. ELT is definitely advantageous when the target database function is more optimized for the transformation compared to that of the tool. Here again from a developer perspective I would suggest to use the ETL tool's option to 'push down' rather than invoking custom stored procs.

The trend, however, is to keep the processing closer to the data to scale. Aster Data is one DWH vendor who has utilized this concept very well. They have introduced what is called in-database MapReduce. Mapreduce is a model to process large data sets popularised by Google. These can be treated as transformations equivalent in the overall data analysis process. They have embedded this logic right into the database to ensure the overhead of moving the data between the database and the analysis server is avoided. That way they are easily able to scale to PetaBytes level processing.

Thanks for your valuable posting.

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)

Please key in the two words you see in the box to validate your identity as an authentic user and reduce spam.

Subscribe to this blog's feed

Follow us on

Blogger Profiles