Data Engineering Zoomcamp
The Data Engineering Zoomcamp, offered by DataTalks.Club, is a comprehensive
hands-on program covering the full data engineering lifecycle — from containerisation
and infrastructure provisioning to batch processing, streaming, and analytics
engineering. The curriculum is structured around real-world tools and cloud-native
workflows used in modern data teams.
Participants build end-to-end data pipelines using industry-standard technologies,
gaining practical experience with orchestration, data warehousing, transformation,
and distributed processing — preparing graduates to design and maintain
production-grade data infrastructure.
Course Modules
Module 1
Containerisation & Infrastructure as Code
- Docker & Docker Compose
- PostgreSQL with Docker
- Terraform on GCP
Module 2
Workflow Orchestration
- Data Lakes overview
- Orchestration with Kestra
Workshop 1
Data Ingestion
- API reading & pipeline scalability
- Incremental loading with dlt
Module 3
Data Warehousing
- BigQuery fundamentals
- Partitioning & clustering
- ML in BigQuery
Module 4
Analytics Engineering
- dbt with DuckDB & BigQuery
- Testing & documentation
- Deployment pipelines
Module 5
Data Platforms
- End-to-end pipelines with Bruin
- Data quality & transformation
- Cloud deployment to BigQuery
Module 6
Batch Processing
- Apache Spark & DataFrames
- SQL with Spark
- GroupBy & Joins internals
Module 7
Streaming
- Apache Kafka fundamentals
- Kafka Streams & KSQL
- Schema management with Avro
← Back to Certificates