ETL vs ELT: Explaining the Difference in Plain English
Clear explanation of ETL and ELT for data engineers — vocabulary, when to use each approach, trade-offs, and real-world phrases for technical discussions and interviews.
ETL and ELT are two foundational patterns for moving and transforming data in modern data engineering. The distinction matters — it affects tool choice, cost, latency, and the types of transformations you can perform. This guide explains the difference clearly, gives you the vocabulary to discuss trade-offs, and provides ready-to-use phrases for technical interviews and team conversations.
ETL: Extract, Transform, Load
What It Means
ETL stands for Extract, Transform, Load — the traditional data pipeline pattern where:
- Extract — data is read from source systems (databases, APIs, files)
- Transform — data is cleaned, shaped, and enriched in a separate processing layer — before reaching the destination
- Load — the transformed data is loaded into the target data warehouse or lake
The key characteristic: transformation happens outside the destination system, in a dedicated processing engine.
Classic ETL tools: Informatica, IBM DataStage, SSIS, Talend, Apache Spark (used in ETL mode), Apache NiFi
“Our legacy pipeline is ETL — we extract from Oracle, transform with a Spark cluster, and load clean data into Redshift. The transformation layer is a separate cluster we maintain.”
ELT: Extract, Load, Transform
What It Means
ELT stands for Extract, Load, Transform — the modern cloud data warehouse pattern where:
- Extract — data is read from source systems
- Load — raw data is loaded directly into the target data warehouse
- Transform — data is transformed inside the data warehouse using SQL
The key characteristic: transformation happens inside the destination system, leveraging the warehouse’s compute power.
Modern ELT tools: dbt (data build tool), Fivetran + Snowflake, Airbyte + BigQuery, Stitch + Redshift
“We’ve moved to an ELT pattern — Fivetran loads raw data into BigQuery, and dbt runs all transformations as SQL models inside the warehouse. No separate processing cluster.”
The Core Difference
| Aspect | ETL | ELT |
|---|---|---|
| Where transformation happens | Separate processing layer | Inside the destination system |
| Raw data in warehouse? | No (only transformed) | Yes (raw + transformed) |
| Transformation language | Various (Spark, Python, SQL) | Primarily SQL |
| Infrastructure | Separate ETL server/cluster | Just the data warehouse |
| Best for | Complex transformations, legacy systems | Cloud data warehouses, SQL-first teams |
| Re-processing | Re-run extract + transform | Re-run SQL models only (data already in warehouse) |
When to Use ETL
Complex or specialised transformations
If your transformations require:
- Machine learning feature engineering
- Complex data science operations not expressible in SQL
- Binary/proprietary format parsing (e.g., legacy EDI or mainframe formats)
ETL is typically more appropriate because the processing layer can run Python, Spark, or other tools beyond SQL.
Data must be cleaned before landing
If source data contains sensitive information (PII, PHI) that must be masked or removed before it enters the warehouse, ETL allows pre-landing transformation.
“We use ETL for the HIPAA-regulated data — PII is anonymised in the transformation layer before it touches our cloud warehouse.”
Legacy system integration
ETL tools with pre-built connectors to legacy systems (mainframes, on-premises ERPs) often provide ETL-first architectures.
When to Use ELT
Cloud data warehouse is the centre of the stack
If your stack is Snowflake, BigQuery, or Redshift, these systems are designed to handle large-scale transformation workloads. ELT is the natural fit.
SQL-first team
ELT with dbt means transformations are written in SQL, version-controlled in Git, and testable — approachable for analysts and data engineers alike.
You want raw data preserved
In ELT, the raw (untransformed) data is always in the warehouse as a historical record. Re-transforming it from scratch is possible at any time.
“The business changed the revenue recognition rule — with ELT, we just updated the dbt model and re-ran it. With our old ETL system, we would have had to re-extract two years of data.”
Iterative transformation development
In ELT, analysts can iterate on transformations in the warehouse SQL editor without waiting for ETL jobs to re-run.
Key Vocabulary
Source System
The source system is where data originates — a production PostgreSQL database, a SaaS application API (Salesforce, Stripe), a log stream.
Data Warehouse / Data Lakehouse
The data warehouse (Snowflake, BigQuery, Redshift) is the central analytical system — where transformed data is stored and queried.
Raw Layer / Bronze Layer
The raw (bronze) layer is where source data lands first, unchanged — the historical record of what arrived from source systems.
Staging Layer / Silver Layer
The staging (silver) layer contains cleaned, deduplicated, and lightly transformed data — usable across models.
Mart Layer / Gold Layer
The mart (gold) layer contains business-facing aggregations and dimensional models — what analysts and BI tools query.
Idempotency
An idempotent pipeline produces the same result when run multiple times on the same input — critical for reliable data engineering. “All our transformations are idempotent — re-running them doesn’t create duplicate records.”
Data Freshness / Latency
Data freshness is how current the data in the warehouse is compared to the source. A 4-hour refresh schedule means data is always up to 4 hours stale.
“Business requires T+1 freshness — yesterday’s sales data must be available by 6 a.m.”
dbt (Data Build Tool)
dbt is the dominant ELT transformation framework — it runs SQL models inside the data warehouse with version control, testing, and lineage tracking.
Explaining the Trade-off in Interviews and Meetings
When recommending ELT:
“Given that the team is SQL-first and we’re on BigQuery, I’d recommend an ELT approach with dbt. We keep raw data in the warehouse, transformations are in version-controlled SQL, and analysts can iterate without waiting for ETL jobs.”
When recommending ETL:
“The source data contains PHI that must be anonymised before it reaches the warehouse. We need an ETL approach — transformation happens before load so sensitive data never lands in our cloud environment.”
When discussing the trade-off:
“ETL is more flexible for complex pre-processing, but ELT has won the modern cloud data warehouse landscape because re-running SQL models is orders of magnitude cheaper than re-running Spark jobs over raw data extraction.”
Practice
Test your data pipeline vocabulary with the Data Engineering exercise set and explore all resources for data engineers in the Data Engineer learning path.