

SRE/Platform Engineer – Observability & Monitoring
We are seeking an experienced SRE/Platform Engineer – Observability & Monitoring to contribute to the design, development, and operational excellence of a Foundational Observability Platform. This role involves implementing Kubernetes-based observability solutions, ensuring system reliability, scalability, and security, and enhancing monitoring capabilities using industry-leading tools.
Key Responsibilities:
• Design and develop technical solutions for a Kubernetes-based Observability Platform.
• Implement and manage Kubernetes controllers, Crossplane compositions, and GitOps-based deployment of CNCF components.
• Optimize monitoring, logging, and tracing solutions using tools like Splunk, Grafana, Datadog, New Relic, and Dynatrace.
• Ensure high availability, security, and resilience of observability platforms in cloud environments.
• Collaborate with cross-functional teams, including product managers and engineers, to enhance platform capabilities.
• Contribute to best practices in Site Reliability Engineering (SRE) and Observability.
• Provide technical mentorship to junior engineers and assist in improving development standards.
• Participate in on-call rotations to support observability platforms and troubleshoot production issues.
Required Qualifications:
• 5+ years of professional experience in software development and infrastructure automation.
• 2+ years of hands-on experience with Kubernetes in production.
• Strong knowledge of large-scale distributed systems and cloud computing platforms (AWS, GCP, or Azure).
• Experience with Observability tools such as Splunk, Grafana, Datadog, New Relic, or Dynatrace.
• Proficiency in Golang or another relevant programming language.
• Strong understanding of networking protocols and cloud security best practices.
Preferred Qualifications:
• Kubernetes certification (CKA, CKAD, or CKS) is a plus.
• Experience with ArgoCD or other GitOps tools.
• Hands-on experience developing Kubernetes controllers in Golang.
• Knowledge of Crossplane, Terraform, or other Infrastructure-as-Code (IaC) tools.
• Experience defining and maintaining SLAs, SLOs, and SLIs for platform governance.
• Familiarity with global compliance and governance requirements for observability platforms.
Why Join Us?
This is more than just a job – it's an opportunity to be part of an innovative, fast-paced environment that values creativity, collaboration, and cutting-edge technology. If you're passionate about observability, reliability, and cloud engineering, this is your chance to make an impact on a global scale!
About Brickred Systems:
Brickred Systems is a global leader in next-generation technology, consulting, and business process service companies. We enable clients to navigate their digital transformation. Brickred Systems delivers a range of consulting services to our clients across multiple industries around the world. Our practices employ highly skilled and experienced individuals with a client-centric passion for innovation and delivery excellence.
With ISO 27001 and ISO 9001 certification and over a decade of experience in managing the systems and workings of global enterprises, we harness the power of cognitive computing hyper-automation, robotics, cloud, analytics, and emerging technologies to help our clients adapt to the digital world and make them successful. Our always-on learning agenda drives their continuous improvement through building and transferring digital skills, expertise, and ideas from our innovation ecosystem.
We are seeking an experienced SRE/Platform Engineer – Observability & Monitoring to contribute to the design, development, and operational excellence of a Foundational Observability Platform. This role involves implementing Kubernetes-based observability solutions, ensuring system reliability, scalability, and security, and enhancing monitoring capabilities using industry-leading tools.
Key Responsibilities:
• Design and develop technical solutions for a Kubernetes-based Observability Platform.
• Implement and manage Kubernetes controllers, Crossplane compositions, and GitOps-based deployment of CNCF components.
• Optimize monitoring, logging, and tracing solutions using tools like Splunk, Grafana, Datadog, New Relic, and Dynatrace.
• Ensure high availability, security, and resilience of observability platforms in cloud environments.
• Collaborate with cross-functional teams, including product managers and engineers, to enhance platform capabilities.
• Contribute to best practices in Site Reliability Engineering (SRE) and Observability.
• Provide technical mentorship to junior engineers and assist in improving development standards.
• Participate in on-call rotations to support observability platforms and troubleshoot production issues.
Required Qualifications:
• 5+ years of professional experience in software development and infrastructure automation.
• 2+ years of hands-on experience with Kubernetes in production.
• Strong knowledge of large-scale distributed systems and cloud computing platforms (AWS, GCP, or Azure).
• Experience with Observability tools such as Splunk, Grafana, Datadog, New Relic, or Dynatrace.
• Proficiency in Golang or another relevant programming language.
• Strong understanding of networking protocols and cloud security best practices.
Preferred Qualifications:
• Kubernetes certification (CKA, CKAD, or CKS) is a plus.
• Experience with ArgoCD or other GitOps tools.
• Hands-on experience developing Kubernetes controllers in Golang.
• Knowledge of Crossplane, Terraform, or other Infrastructure-as-Code (IaC) tools.
• Experience defining and maintaining SLAs, SLOs, and SLIs for platform governance.
• Familiarity with global compliance and governance requirements for observability platforms.
Why Join Us?
This is more than just a job – it's an opportunity to be part of an innovative, fast-paced environment that values creativity, collaboration, and cutting-edge technology. If you're passionate about observability, reliability, and cloud engineering, this is your chance to make an impact on a global scale!
About Brickred Systems:
Brickred Systems is a global leader in next-generation technology, consulting, and business process service companies. We enable clients to navigate their digital transformation. Brickred Systems delivers a range of consulting services to our clients across multiple industries around the world. Our practices employ highly skilled and experienced individuals with a client-centric passion for innovation and delivery excellence.
With ISO 27001 and ISO 9001 certification and over a decade of experience in managing the systems and workings of global enterprises, we harness the power of cognitive computing hyper-automation, robotics, cloud, analytics, and emerging technologies to help our clients adapt to the digital world and make them successful. Our always-on learning agenda drives their continuous improvement through building and transferring digital skills, expertise, and ideas from our innovation ecosystem.