Browse All Jobs
Job Description

Cisco ThousandEyes is seeking a Senior Site Reliability Engineer (SRE) to join their Network Assurance Data Platform team. This role is crucial for ensuring the reliability, scalability, and security of Cisco's cloud and big data platforms. The SRE will collaborate with cross-functional teams to design, build, and maintain systems operating at multi-region scale, directly impacting the success of machine learning (ML) and AI initiatives.

What This Role Involves:

  • Designing, building, and optimizing cloud and data infrastructure.
  • Implementing SRE principles such as monitoring, alerting, and fault analysis.
  • Collaborating with development, product management, and security teams.
  • Troubleshooting complex technical issues in production environments.
  • Contributing to continuous improvement efforts.
  • Shaping the team’s technical strategy and roadmap.
  • Mentoring peers and fostering a culture of learning.

Requirements:

  • Ability to design and implement scalable and well-tested solutions.
  • Strong hands-on experience in cloud, preferably AWS.
  • Strong Infrastructure as Code skills, ideally with Terraform and Kubernetes.
  • Previous experience in AWS cost management.
  • Understanding of Prometheus and its ecosystem, including Alertmanager.
  • Ability to write high-quality code in Python, Go, or equivalent languages.
  • Good understanding of Unix/Linux systems, the kernel, system libraries, file systems, and client-server protocols.

What This Role Offers:

  • Opportunity to work on cutting-edge cloud and big data platforms.
  • Collaboration with cross-functional teams.
  • A chance to shape the technical strategy and roadmap.
  • Mentoring opportunities and a culture of learning.
Apply Manually

Cisco ThousandEyes

Cisco ThousandEyes is a Digital Experience Assurance platform that helps organizations ensure optimal digital experiences across all networks. Leveraging AI and comprehensive telemetry data from cloud, internet, and enterprise networks, ThousandEyes enables proactive detection, diagnosis, and remediation of issues. Integrated within Cisco's technology portfolio, it delivers AI-driven insights for networking, security, collaboration, and observability, facilitating scalable deployments and enhanced end-user experiences.

All Jobs at Cisco ThousandEyes (59)