Data Engineer Zoomcamp Certificate

Data Engineering Zoomcamp

The Data Engineering Zoomcamp, offered by DataTalks.Club, is a comprehensive hands-on program covering the full data engineering lifecycle — from containerisation and infrastructure provisioning to batch processing, streaming, and analytics engineering. The curriculum is structured around real-world tools and cloud-native workflows used in modern data teams.

Participants build end-to-end data pipelines using industry-standard technologies, gaining practical experience with orchestration, data warehousing, transformation, and distributed processing — preparing graduates to design and maintain production-grade data infrastructure.

Course Modules

Module 1
Containerisation & Infrastructure as Code
  • Docker & Docker Compose
  • PostgreSQL with Docker
  • Terraform on GCP
Module 2
Workflow Orchestration
  • Data Lakes overview
  • Orchestration with Kestra
Workshop 1
Data Ingestion
  • API reading & pipeline scalability
  • Incremental loading with dlt
Module 3
Data Warehousing
  • BigQuery fundamentals
  • Partitioning & clustering
  • ML in BigQuery
Module 4
Analytics Engineering
  • dbt with DuckDB & BigQuery
  • Testing & documentation
  • Deployment pipelines
Module 5
Data Platforms
  • End-to-end pipelines with Bruin
  • Data quality & transformation
  • Cloud deployment to BigQuery
Module 6
Batch Processing
  • Apache Spark & DataFrames
  • SQL with Spark
  • GroupBy & Joins internals
Module 7
Streaming
  • Apache Kafka fundamentals
  • Kafka Streams & KSQL
  • Schema management with Avro
← Back to Certificates