Key Responsibilities:
- Design, develop, and implement machine learning models using Spark ML for predictive analytics
- Build and optimize end-to-end training and inference pipelines on distributed platforms
- Process and analyze large datasets to uncover insights and engineer effective features
- Work closely with data engineers to integrate ML models into existing data pipelines
- Fine-tune models and hyperparameters to maximize performance and accuracy
- Develop scalable solutions for both real-time and batch inference
- Continuously monitor deployed models and address any performance issues
- Keep up-to-date with the latest tools, frameworks, and best practices in machine learning and distributed computing
Required Qualifications:
- 10+ years of experience as a Machine Learning Engineer or similar role
- Expertise in Apache Spark and Spark MLlib
- Solid understanding of predictive modeling techniques (e.g., regression, classification, clustering)
- Hands-on experience with distributed systems such as Hadoop
- Proficiency in Python, Scala, or Java
- Strong grasp of data preprocessing and feature engineering methodologies
- Familiarity with model evaluation metrics and production deployment practices
- In-depth knowledge of distributed computing and parallel processing principles