In this post, we’ll discuss the difference between ETL vs ELT and when you might choose ETL or ELT. We’ll also include a flowchart to help walk you through the ETL vs ELT decision-making process.
The difference between ETL vs ELT
What’s the difference between ETL and ELT? The short answer is it’s all about when you transform. The longer answer? Well, read on!
Accelerate your career
Get started with ACG and transform your career with courses and real hands-on labs in AWS, Microsoft Azure, Google Cloud, and beyond.
The ETL Process
ETL (or Extract, Transform, Load) is the process of gathering data to a central data warehouse for analytics.
- Extract: Your traditional ETL process first extracts the data. In this step the data validity should be checked, any invalid data can be returned or corrected.
- Transform: Next any necessary transformations are performed. Transformations are a series of rules or functions as required by business and technical requirements. They can be anything from ensuring the correct format of the data to combining it with other data sources.
- Load: Lastly, the data will be loaded into the end target system. This can be anything from a simple flat file to a cloud-based data warehouse.
The ELT Process
So that’s the ETL process, but what about the ELT process? Well, remember: it’s all about when you transform.
Depending on the transformations necessary and the type endpoint you chose, you may want to transform the data after loading it. This is referred to as an ELT process (or Extract, Load, Transform). This is useful when loading to an end target that provides enough processing power to perform any transformations required.
ETL vs ELT flowchart
This handy flowchart can help you decide which process is right for you.
When to choose ETL vs ELT
If your endpoint target is a cloud-based data warehouse such as Amazon Redshift or other relational database, then performing transformations after loading is a cinch. Once loaded, the data can be transformed using the SQL query language. Complex transformations can be easily run and changed as needed.
If instead your endpoint is flat storage like a Data Lake, then any transformations should be run before loading as this storage offers little ability to perform transformations. In order to use more cost-effective storage, you are required to perform transformations before loading.
The nature of the transformations you wish to run will also play a role in which process you choose.
Very simple transformations should be performed before loading into your endpoint regardless of type. Many of these transformations may also fall into the data cleansing or data validation steps of your process. Completion of these before loading ensures a consistent data set is loaded.
When choosing between ETL and ELT, it’s all about when and how you transform. If your transformations are simple and/or your endpoint does not support complex transformations, then you should definitely choose an ETL process and perform the transformations prior to loading. However, if your transformations are complex and subject to frequent changes and your endpoint supports the processing power, then an ELT process gives you the flexibility to run any transformations after loading the data.
Learn more about ETL and ELT
Want to learn more about ETL and ELT? Check out ACG’s ETL and ELT Basics course.
This course will discuss the steps required for each step in the ETL process and look at different solutions for some real-world scenarios. We’ll explore extracting data from multiple sources and formats, performing transformations on the data, and finally loading the data into the final location.
This course does assume some experience with relational databases, as well as standard Unix command-line tools such as sed, grep, and awk.
Ready to begin? Start a free trial now, and keep being awesome, cloud gurus!
NoSQL for Grownups: DynamoDB Single-Table Modeling
In this free on-demand webinar, Rick Houlihan, Sr. Practice Manager at AWS and inventor of single-table DynamoDB design, shows his tricks for modeling complex data access patterns in DynamoDB.