Browse All Jobs
Job Description

Cisco ThousandEyes is seeking a Senior Site Reliability Engineer (SRE) to join their Production Engineering team in London. The ideal candidate will have a strong background in SaaS and operations, with expertise in designing and managing large-scale, highly available distributed systems in the cloud. This role involves collaborating with application development teams to enhance the reliability, performance, and security of the ThousandEyes platform.

What this role involves:

  • Collaborating with software engineers to optimize architecture and services.
  • Designing and implementing scalable operations tooling.
  • Designing, deploying, and maintaining AWS cloud-native services.
  • Participating in 24x7 incident response and on-call rotation.
  • Using and expanding CNCF solutions like Kubernetes, Service Mesh, Prometheus, OpenTelemetry, and ArgoCD.
  • Automating production operations.
  • Developing automation solutions for scalable service and platform operations.
  • Staying updated on industry best practices for scalability and reliability.
  • Identifying and providing solutions to common obstacles hindering operational excellence.
  • Generalizing and standardizing solutions and processes.
  • Playing a key role in the ThousandEyes platform by leveraging scale testing.
  • Managing a rapidly growing infrastructure.

Requirements:

  • Expert-level knowledge of Kubernetes and its ecosystem.
  • Proficiency in software development with languages such as Python or Go.
  • In-depth knowledge of cloud providers, preferably AWS.
  • Proven ability to build and implement scalable and well-tested solutions.
  • Strong understanding of Unix/Linux systems.
  • Knowledge of Site Reliability principles.

What this role offers:

  • Opportunity to work on a leading Digital Experience Assurance platform.
  • Collaboration with a diverse and talented team.
  • A hybrid work approach with at least one day a week in the London office.
  • Exposure to cutting-edge technologies and industry best practices.
Apply Manually

Cisco ThousandEyes

Cisco ThousandEyes is a Digital Experience Assurance platform that helps organizations ensure optimal digital experiences across all networks. Leveraging AI and comprehensive telemetry data from cloud, internet, and enterprise networks, ThousandEyes enables proactive detection, diagnosis, and remediation of issues. Integrated within Cisco's technology portfolio, it delivers AI-driven insights for networking, security, collaboration, and observability, facilitating scalable deployments and enhanced end-user experiences.

All Jobs at Cisco ThousandEyes (59)