

MLOps Engineer - AWS
Our client, a leading technology-driven organization, is seeking an MLOps Engineer to support their AI/ML infrastructure and optimize their machine learning workflows. This hybrid role is based in San Francisco, CA, requiring expertise in machine learning operations, model deployment, and data engineering support. The ideal candidate will play a critical role in transitioning ML models, maintaining infrastructure, and optimizing workflow orchestration.
Key Responsibilities:
MLOps Platform Support:
• Lead the transition from AWS SageMaker to DataRobot for model development and deployment.
• Optimize the model training, validation, and deployment pipeline using DataRobot.
• Automate and maintain CI/CD workflows for ML model deployment, ensuring efficient updates and performance monitoring.
• Develop and maintain Python-based components supporting ML model workflows and inference pipelines.
• Work closely with Data Scientists and ML Engineers to fine-tune model deployment strategies, ensuring optimal performance and scalability.
Airflow Platform Administration:
• Administer and enhance the Central Airflow platform using AWS Managed Workflows for Apache Airflow (MWAA).
• Manage, deploy, and optimize Airflow DAGs, ensuring high availability and adherence to best practices.
• Implement Airflow Secrets Management (AWS SecretsManager) for secure data handling.
• Monitor DAG execution, troubleshoot failures, and optimize task scheduling to ensure efficient processing.
• Work with cross-functional teams to review and approve workflow modifications, ensuring compliance with security and scalability best practices.
Technical Skills & Requirements:
• Programming: Proficiency in Python for scripting, automation, and ML pipeline development.
• Orchestration & Workflow Management: Experience with Apache Airflow (DAG authoring, administration, and optimization).
• Cloud Platforms & Services: Strong hands-on expertise in AWS, particularly S3, SageMaker, MWAA, ECS, Secrets Manager.
• Infrastructure as Code (IaC): Proficiency in Terraform for provisioning cloud resources and managing infrastructure.
• CI/CD & Automation: Hands-on experience with GitHub Actions or similar CI/CD tools for automating deployments.
• Containerization & OS: Working knowledge of Docker and Linux-based systems for managing ML workloads.
• APIs & Integration: Experience with REST APIs for data exchange between ML models and applications.
Preferred Qualifications:
• Prior experience migrating ML workloads from AWS SageMaker to DataRobot.
• Experience in MLOps best practices, including model versioning, monitoring, and automated retraining.
• Familiarity with Data Engineering workflows, including ETL/ELT pipelines and real-time data streaming.
• Strong troubleshooting skills for workflow failures, cloud infrastructure issues, and model performance tuning.