

Mlops L2 Support
MLOps L2 Support Engineer (On-Call & Weekend Support)
Job Summary:
MLOps L2 Support Engineer to provide 24/7 production support for machine learning (ML) and data pipelines. The role requires on-call support, including weekends, to ensure high availability and reliability of ML workflows. The candidate will work with Dataiku, AWS, CI/CD pipelines, and containerized deployments to maintain and troubleshoot ML models in production.
Key Responsibilities:
Incident Management & Support:
• Provide L2 support for MLOps production environments, ensuring uptime and reliability.
• Troubleshoot ML pipelines, data processing jobs, and API issues.
• Monitor logs, alerts, and performance metrics using Dataiku, Prometheus, Grafana, or AWS tools such CloudWatch.
• Perform root cause analysis (RCA) and resolve incidents within SLAs.
• Escalate unresolved issues to L3 engineering teams when needed.
Dataiku Platform Management:
• Manage Dataiku DSS workflows, troubleshoot job failures, and optimize performance.
• Monitor and support Dataiku plugins, APIs, and automation scenarios.
• Collaborate with Data Scientists and Data Engineers to debug ML model deployments.
• Perform version control and CI/CD integration for Dataiku projects.
Deployment & Automation:
• Support CI/CD pipelines for ML model deployment (Bamboo, Bitbucket etc).
• Deploy ML models and data pipelines using Docker, Kubernetes, or Dataiku Flow.
• Automate monitoring and alerting for ML model drift, data quality, and performance.
Cloud & Infrastructure Support:
• Monitor AWS-based ML workloads (SageMaker, Lambda, ECS, S3, RDS).
• Manage storage and compute resources for ML workflows.
• Support database connections, data ingestion, and ETL pipelines (SQL, Spark, Kafka).
Security & Compliance:
• Ensure secure access control for ML models and data pipelines.
• Support audit, compliance, and governance for Dataiku and MLOps workflows.
• Respond to security incidents related to ML models and data access.
MLOps L2 Support Engineer (On-Call & Weekend Support)
Job Summary:
MLOps L2 Support Engineer to provide 24/7 production support for machine learning (ML) and data pipelines. The role requires on-call support, including weekends, to ensure high availability and reliability of ML workflows. The candidate will work with Dataiku, AWS, CI/CD pipelines, and containerized deployments to maintain and troubleshoot ML models in production.
Key Responsibilities:
Incident Management & Support:
• Provide L2 support for MLOps production environments, ensuring uptime and reliability.
• Troubleshoot ML pipelines, data processing jobs, and API issues.
• Monitor logs, alerts, and performance metrics using Dataiku, Prometheus, Grafana, or AWS tools such CloudWatch.
• Perform root cause analysis (RCA) and resolve incidents within SLAs.
• Escalate unresolved issues to L3 engineering teams when needed.
Dataiku Platform Management:
• Manage Dataiku DSS workflows, troubleshoot job failures, and optimize performance.
• Monitor and support Dataiku plugins, APIs, and automation scenarios.
• Collaborate with Data Scientists and Data Engineers to debug ML model deployments.
• Perform version control and CI/CD integration for Dataiku projects.
Deployment & Automation:
• Support CI/CD pipelines for ML model deployment (Bamboo, Bitbucket etc).
• Deploy ML models and data pipelines using Docker, Kubernetes, or Dataiku Flow.
• Automate monitoring and alerting for ML model drift, data quality, and performance.
Cloud & Infrastructure Support:
• Monitor AWS-based ML workloads (SageMaker, Lambda, ECS, S3, RDS).
• Manage storage and compute resources for ML workflows.
• Support database connections, data ingestion, and ETL pipelines (SQL, Spark, Kafka).
Security & Compliance:
• Ensure secure access control for ML models and data pipelines.
• Support audit, compliance, and governance for Dataiku and MLOps workflows.
• Respond to security incidents related to ML models and data access.