← All Jobs
Posted Apr 16, 2026

Data Engineer: Scalable Pipelines for ML Workflows

Apply Now
Roles and Responsibility - • Design, build, and maintain scalable and reliable data pipelines for dataset creation, transformation, and benchmarking • Own and optimize Airflow pipelines on AWS for data processing, orchestration, and evaluation workflows • Write efficient, production-grade SQL and Python code for large-scale data processing and analysis • Partner closely with ML engineers to enable model training, evaluation, and benchmarking pipelines • Improve pipeline performance, reliability, and observability, ensuring high data quality in production • Build and maintain systems to support model performance tracking and data drift monitoring • Troubleshoot and resolve data issues across pipelines, ensuring minimal impact on ML workflows • Contribute to data architecture decisions and best practices across the platform • Collaborate cross-functionally with ML, platform, and data teams to support scalable ML infrastructure What Were Looking For • 35 years of experience in Data Engineering, Data Platforms, or related roles • Strong proficiency in Python and SQL with experience in production systems • Hands-on experience with AWS services (S3, EC2, SageMaker or similar) • Solid experience building and managing Airflow (or similar orchestration tools) • Strong understanding of data engineering fundamentals (ETL/ELT, data modeling, pipeline design) • Experience working with large-scale datasets and distributed data systems • Experience supporting ML workflows, datasets, or evaluation pipelines • Strong problem-solving skills and ability to work independently in a fast-paced environment Nice to Have • Experience with ML infrastructure, MLOps, or model evaluation workflows • Exposure to biometric systems or computer vision datasets • Familiarity with data quality frameworks, monitoring, and observability tools • Experience working in SaaS or high-scale production environments
Interested in this role?Apply on iHire