• Designed and implemented scalable ETL pipelines using PySpark to ingest, cleanse, and transform structured and semi-structured data across distributed systems.
• Designed and implemented data ingestion pipelines that consumed REST APIs using Python, handling authentication (Bearer token, API key) and pagination for scalable data extraction.
• Performed data cleansing, transformation, and enrichment using PySpark DataFrame APIs and Spark SQL on large-scale datasets (TB-level).