Key Responsibilities:
- Architect and implement scalable data lakehouse solutions using Databricks on AWS, including Delta Lake, Unity Catalog, and Lakehouse AI features.
- Develop and orchestrate multi-agent AI applications leveraging LangGraph, LangChain, and other LLM toolchains to support business workflows and automation.
- Design and implement vector database search solutions for retrieval-augmented generation (RAG), semantic search, and contextual memory.
- Work with multiple file formats including Apache Iceberg, Parquet, and Delta, ensuring performance optimization, schema evolution, and ACID compliance.
- Integrate LLM orchestration pipelines into Lakehouse workflows, enabling intelligent data retrieval, transformation, and reasoning.
- Collaborate with data scientists, ML engineers, and platform teams to build end-to-end AI-powered data pipelines.
- Build reusable components, agent workflows, and vector indexing strategies for enterprise AI use cases.
Required Qualifications:
- 5+ years of experience in data engineering or AI/ML engineering.
- Hands-on expertise with Databricks on AWS, including workspace management, notebooks, MLflow, and Unity Catalog.
- Experience with Lakehouse AI, Vector Search, and Agentic AI (LangGraph, CrewAI, OpenAgents, or similar frameworks).
- Proficient in Python and SQL, with experience in developing data transformation logic and vector embedding workflows.
- Strong knowledge of file formats like Apache Iceberg, Delta Lake, Parquet, and Avro.
- Experience deploying and optimizing vector databases such as FAISS, Chroma, Weaviate, or native Databricks vector search.
- Familiarity with LLMOps practices and tooling (e.g., LangSmith, PromptLayer, Weights & Biases).
- Understanding of LLM APIs (OpenAI, HuggingFace), prompt engineering, and memory management in agentic systems.
Preferred Qualifications:
- Databricks certifications (Data Engineer, Machine Learning Associate, Generative AI Engineer).
- Experience with LangChain Expression Language (LCEL) and LangGraph DAGs for agent workflows.
- Familiarity with Databricks Model Serving, Feature Store, and production-grade LLM integrations.
- Exposure to event-driven architectures, CI/CD for ML, and secure data governance using Unity Catalog.
- teams, stakeholders, and business leaders to align data strategies with organizational goals.