

Kubeflow Engineer
Kubeflow Engineer
Job Overview
We are looking for a talented Kubeflow Engineer to join our innovative team. You’ll play a pivotal role in deploying, managing, and optimizing scalable and production-ready machine learning workflows using Kubeflow and Kubernetes.
We offer exceptional flexibility, welcoming candidates interested in part-time, full-time, contract, freelance, or remote working arrangements. Whether you are seeking full-time stability or part-time/flexible engagements to align with your existing commitments, we encourage you to apply.
The ideal candidate will collaborate closely with Data Scientists, ML Engineers, DevOps Engineers, and product teams to streamline and automate machine learning operations (MLOps), making a direct impact on the efficiency, reliability, and scalability of our AI-driven solutions.
Key Responsibilities
Kubeflow Environment Management
• Deploy, configure, maintain, and upgrade Kubeflow clusters.
• Ensure environments are secure, stable, scalable, and highly available.
Design & Implement ML Pipelines
• Build and maintain end-to-end ML pipelines using Kubeflow Pipelines.
• Develop pipeline components including data ingestion, preprocessing, training, validation, optimization (Katib), model serving (KFServing/KServe), and monitoring.
MLOps & Automation
• Integrate Kubeflow into CI/CD frameworks (GitHub Actions, GitLab CI/CD, ArgoCD).
• Enable automated deployments, continuous integration, and continuous delivery of ML workflows.
Performance Optimization & Troubleshooting
• Monitor infrastructure performance, pipeline efficiency, and resource management.
• Troubleshoot and resolve operational challenges quickly and effectively.
Collaboration & Knowledge Sharing
• Work closely with multidisciplinary teams to align ML infrastructure with business objectives.
• Mentor or advise team members on Kubeflow and MLOps best practices.
Security, Compliance & Governance
• Ensure security best practices, compliance standards, and governance procedures are implemented and maintained across Kubeflow environments.
Skills & Qualifications
Essential Technical Skills:
Kubeflow & Kubernetes
• Strong proficiency deploying and managing Kubeflow clusters, Kubeflow Pipelines, Katib, and KFServing/KServe.
• Expert-level knowledge of Kubernetes (kubectl, Helm, Kustomize, operators).
Cloud & Infrastructure Management
• Experience with cloud Kubernetes platforms (AWS EKS, Azure AKS, Google GKE).
• Knowledge of Infrastructure-as-Code tools (Terraform, Ansible, Pulumi, CloudFormation).
Containerization & CI/CD
• Docker containerization and registry management.
• CI/CD tools experience (GitHub Actions, GitLab CI, Jenkins, ArgoCD).
Programming & Scripting
• Proficiency in Python scripting, Bash/Shell, YAML manifests, and automation tools.
Monitoring & Observability
• Practical experience with Prometheus, Grafana, Elastic Stack, Jaeger, or similar observability tools.
Machine Learning Frameworks
• Familiarity with ML/DL frameworks (TensorFlow, PyTorch, XGBoost, scikit-learn).
• Understanding of data and artifact management tools (MLflow, DVC, MinIO, cloud storage).
Experience & Qualifications
Essential:
• Minimum 2-5+ years' experience in MLOps, DevOps, or ML engineering roles.
• Minimum 2 years of hands-on Kubeflow implementation and production management experience.
Education:
• Bachelor's or Master's degree in Computer Science, Data Science, Software Engineering, or related technical fields, or equivalent practical experience.
Competencies & Attributes
• Strong analytical and problem-solving abilities.
• Clear communication and interpersonal skills.
• Proactive, self-driven, and comfortable working in flexible, autonomous environments.
• Capacity to balance multiple tasks and priorities effectively.