Key Responsibilities:
- Design, build, and optimize scalable data pipelines.
- Develop and operationalize data products across structured and unstructured data, including alternative data sources.
- Deploy, manage, and tune Spark workloads on Databricks, ensuring scalability and cost-efficiency.
- Collaborate with data science and business teams to deliver data-driven insights.
- Support CI/CD automation, infrastructure-as-code, and reusable data frameworks.
Essential Skills:
- Strong experience with Python and PySpark.
- Hands-on expertise with Databricks components: Jobs, Workflows, Delta Lake, Unity Catalog.
- Proficiency in SQL for advanced data transformations.
- Deep understanding of distributed data processing and production-grade data workflows.
- Exposure to Machine Learning tools like MLflow.
- Experience with alternative data (web, geospatial, satellite, sentiment).
- Familiarity with Snowflake, Airflow, or other orchestration/warehousing platforms.
- Understanding of CI/CD, version control, and deployment best practices.