- Pyspark, Python, SQL coding is important.
- Design, build, and maintain robust ETL/ELT pipelines using AWS Glue and Apache Airflow.
- Expertise in Apache Airflow DAG creation, scheduling, dependency management, and error handling.
- Strong hands-on experience with AWS Glue both Glue Jobs (PySpark) and Glue Catalog.
- Develop SQL scripts and optimize data models in Amazon Redshift.
- Orchestrate data workflows and job dependencies using Airflow DAGs.
- Collaborate with data engineers, analysts, and stakeholders to understand data requirements and deliver scalable solutions.
- Implement and enforce data quality checks, schema validation, and monitoring for all data pipelines.
- Integrate various AWS services including S3, Lambda, Step Functions, Athena, and CloudWatch.