Building a Scalable ETL Pipeline with Apache Spark, Airflow, and Snowflake

Authors

Ujjawal Nayak

Abstract

Extract, Transform, and Load (ETL) pipelines are critical in modern data engineering, enabling efficient data integration and analytics. This paper presents a scalable ETL pipeline leveraging Apache Spark for distributed data processing, Apache Airflow for workflow orchestration, and Snowflake as a cloud-based data warehouse. The proposed architecture ensures fault tolerance, cost efficiency, and high scalability, making it suitable for handling large-scale enterprise data workloads.

Keywords

ETL, Apache Spark, Airflow, Snowflake, Data Engineering, Scalable Architecture

Published In

Publication Number

Page Numbers

DOI

Paper Details

Building a Scalable ETL Pipeline with Apache Spark, Airflow, and Snowflake

Ujjawal Nayak

Citation

Download/View Paper

Download/View Count

Share This Article