Requirements
• 10+ years of industry experience spanning machine learning engineering and distributed systems,
• 3+ years of leadership and management experience, with a proven ability to build and lead strong technical teams,
• MSc or Ph.D. in Computer Science, Machine Learning, or related field, or equivalent practical experience,
• Proven expertise in building and deploying end-to-end ML systems at scale, including recommendation and personalization systems,
• Strong background in distributed systems architecture, including low-latency services, streaming platforms, and large-scale serving,
• Hands-on experience with deep learning frameworks (e.g., TensorFlow, PyTorch) and ML infrastructure technologies,
• Track record of delivering high-quality, scalable, and fault-tolerant systems,
• Excellent communication skills and ability to influence product and technical strategy,
• Proven experience deploying large-scale serving systems on AWS and demonstrated expertise in leveraging Databricks for large-scale data processing and ML workflows
What the job involves
• We are seeking a Director of Machine Learning Engineering and Infrastructure to lead a hybrid team bridging advanced ML engineering with world-class infrastructure design,
• In this role, you will own the strategic direction and execution for scaling our machine learning capabilities while ensuring our distributed systems and infrastructure can support innovation at massive scale,
• You will combine technical depth with leadership excellence to guide teams that deliver both foundational ML systems and high-performance distributed services,
• Lead and manage high-performing teams across ML engineering and ML infrastructure, fostering a culture of innovation, collaboration, and growth,
• Define and execute the strategic roadmap for ML systems, including recommendation, personalization, and ads optimization,
• Oversee the design, development, and deployment of scalable ML pipelines: data ingestion, feature engineering, model training, evaluation, and serving,
• Architect distributed systems to support ML workloads at scale, ensuring reliability, observability, and operational excellence,
• Partner closely with Product, Engineering, and Content teams to align on business goals and deliver impactful ML-driven experiences,
• Support best practices in experimentation, evaluation, and ML system monitoring,
• Ensure cost efficiency, scalability, and performance in ML infrastructure investments