Requirements
Must Haves:
- Hands-on experience with Databricks (including Unity Catalog, Delta Lake, Auto loader and PySpark)
- Knowledge of Medallion Architecture patterns in Databricks and designing and supporting data pipelines in a Bronze/Silver/Gold (Medallion) architecture
- Experience conducting data profiling to identify structure, completeness, and data quality issues
- Experience in Azure cloud data architecture
- Extensive experience designing and managing ETL pipelines, including Change Data Capture (CDC)
- Experience implementing role-based access control (RBAC)
- Demonstrated ability to lead data platform initiatives from requirements gathering through design, development, and deployment
Technical Knowledge 60%
- Expert knowledge of data warehouse design methodologies, including Delta Lake and Medallion Architecture, with deep understanding of Delta Lake optimizations.
- Proficient in Azure Data Lake, Delta Lake, Azure DevOps, Git, and API testing tools like Postman.
- Strong proficiency in relational databases with expertise in writing, tuning, and debugging complex SQL queries.
- Experienced in integrating and managing REST APIs for downstream systems like MDM and FHIR services.
- Skilled in designing and optimizing ETL/ELT pipelines in Databricks using PySpark, SQL, and Delta Live Tables, including implementing Change Data Capture (batch and streaming).
- Experienced in metadata-driven ingestion and transformation pipelines with Python and PySpark.
- Familiar with Unity Catalog structure and management, including configuring fine-grained permissions and workspace ACLs for secure data governance.
- Ability to lead logical and physical data modeling across lakehouse layers (Bronze, Silver, Gold) and define business and technical metadata.
- Experienced with Databricks job and all-purpose cluster configuration, optimization, and DevOps practices such as notebook versioning and environment management.
- Proficient in assessing and profiling large volumes of data to ensure data quality and support business rules.
- Able to collaborate effectively with ETL developers and business analysts to translate user stories into technical pipeline logic.
General Skills (40%)
- 5+ years in data engineering, ideally in cloud data lake environments
- Ability to translate business requirements into scalable data architectures, data models, and governance frameworks
- Able to serve as technical advisor during sprint planning and backlog grooming.
- Skilled in conducting data discovery, profiling, and quality assessments to guide architecture and modeling decisions
- Capable of conducting performance diagnostics and root cause analysis across multiple layers (DB, ETL, infrastructure)
- Strong communication skills for working with business stakeholders, developers, and executives
- Passion for mentoring, training, and establishing reusable frameworks and best practices
- Experience with agile practices, including sprints, user stories, and iterative development, especially when working in an agile data environment
- Experience grooming and assembling requirements into coherent user stories and use cases and managing the Product Backlog Items, refining them and communicate changes to project manager/Team Lead
- Analyze current and future data needs, data flows, and data governance practices to support enterprise data strategies
- Lead data discovery efforts and participate in the design of data models and data integration solutions