Job Description
Cisco ThousandEyes is seeking a Senior Site Reliability Engineer to join its Datastores team in Mexico City. This team focuses on the platform's mission-critical datastores, including ElasticSearch, Kafka, MongoDB, and MySQL. The role involves managing all aspects of these datastores, such as availability, performance, change management, capacity planning, monitoring, and incident response. The ideal candidate will collaborate with application development teams to ensure the reliability and performance of the infrastructure, handling a large volume of incoming data daily.Responsibilities of the role include:
- Collaborating with software engineers to optimize the datastores infrastructure for availability, latency, and performance.
- Building and supporting mission-critical services with a focus on automation, availability, and performance.
- Designing, implementing, and maintaining elastic and resilient datastores that support the platform's multi-region scale.
- Driving and building automation to enable effortless scaling of datastores.
- Participating in and contributing to the 24x7 incident response and on-call rotation.
Requirements for the role include:
- Ability to design and implement scalable and well-tested solutions, with a focus on datastores.
- Ability to write high-quality code in Python, Go, or equivalent languages.
- Strong Infrastructure as Code skills, ideally with Terraform and Kubernetes.
- Good knowledge of cloud provider managed services (ideally AWS).
- Good understanding of Unix/Linux systems, the kernel, system libraries, file systems, and client-server protocols.
- Strong communication and documentation skills.
- Experience running highly performant and highly available MySQL, MongoDB, DynamoDB and/or Apache Druid databases.
The role offers:
- The opportunity to work with cutting-edge datastore technologies.
- A collaborative environment with software engineers.
- The chance to contribute to a platform that handles a very high volume of data.