Roles and Responsibility -
• Design, build, and maintain scalable and reliable data pipelines for dataset creation, transformation, and benchmarking
• Own and optimize Airflow pipelines on AWS for data processing, orchestration, and evaluation workflows
• Write efficient, production-grade SQL and Python code for large-scale data processing and analysis
• Partner closely with ML engineers to enable model training, evaluation, and benchmarking pipelines
• Improve pipeline performance, reliability, and observability, ensuring high data quality in production
• Build and maintain systems to support model performance tracking and data drift monitoring
• Troubleshoot and resolve data issues across pipelines, ensuring minimal impact on ML workflows
• Contribute to data architecture decisions and best practices across the platform
• Collaborate cross-functionally with ML, platform, and data teams to support scalable ML infrastructure
What Were Looking For
• 35 years of experience in Data Engineering, Data Platforms, or related roles
• Strong proficiency in Python and SQL with experience in production systems
• Hands-on experience with AWS services (S3, EC2, SageMaker or similar)
• Solid experience building and managing Airflow (or similar orchestration tools)
• Strong understanding of data engineering fundamentals (ETL/ELT, data modeling, pipeline design)
• Experience working with large-scale datasets and distributed data systems
• Experience supporting ML workflows, datasets, or evaluation pipelines
• Strong problem-solving skills and ability to work independently in a fast-paced environment
Nice to Have
• Experience with ML infrastructure, MLOps, or model evaluation workflows
• Exposure to biometric systems or computer vision datasets
• Familiarity with data quality frameworks, monitoring, and observability tools
• Experience working in SaaS or high-scale production environments