Job Description
Cisco ThousandEyes is seeking a Principal Site Reliability Engineer to join their Datastores team. This team is responsible for the reliability of the platform's mission-critical datastores, including ElasticSearch, Kafka, MongoDB, and MySQL.The Principal Site Reliability Engineer will focus on innovation and providing technical vision, working with the team to build reliable, scalable, and highly available datastores on a constantly growing multi-region platform. They will partner with leaders across ThousandEyes as a datastores subject matter expert to help design architectures and processes, and serve as a role model for the Engineering team.
What this role involves: - Ensuring the ThousandEyes platform's services use the right datastores infrastructure.
- Designing and optimizing datastores for availability, latency, and performance.
- Collaborating to formulate a compelling vision for the systems the team owns.
- Writing software and automating systems to enable datastores to scale effortlessly.
- Working across Engineering and Product Management to shape the future direction of the platform's datastores.
- Mentoring and up-leveling the team.
Requirements: - Deep knowledge of datastores.
- Experience building and supporting mission-critical datastores.
- Strong technical vision and communication skills.
- Expertise in reliability and automation.
- Ability to design and implement scalable and well-tested solutions.
- Proficiency in Python, Go, or equivalent languages.
- Strong Infrastructure as Code skills with Terraform and Kubernetes.
- Good knowledge of cloud provider managed services (ideally AWS).
- Good understanding of Unix/Linux systems and client-server protocols.
- Strong communication and documentation skills.
What this role offers: - Opportunity to work on a constantly growing multi-region platform.
- Chance to innovate and provide technical vision.
- Collaboration with leaders across ThousandEyes.
- Mentoring and up-leveling the team.