Job Description
Pendo is seeking a Sr. Site Reliability Engineer (SRE) to join its team in Sheffield, UK. The SRE team is responsible for provisioning and maintaining cloud infrastructure, ensuring product reliability, performance, and cost-efficiency.
The role involves working with developers and product managers to understand service level objectives, design systems, and ensure cloud infrastructure security. The ideal candidate will have experience with cloud infrastructure, strong programming skills, and the ability to troubleshoot distributed systems.
Role Responsibilities:
- Write high-quality infrastructure-as-code for automation.
- Write maintainable code for product functionality.
- Collaborate with engineers on service design and monitoring.
- Debug production issues and implement preventative measures.
- Maintain and automate runbooks.
- Proactively track capacity and performance limits.
- Participate in a 24x7 on-call rotation.
Minimum Qualifications:
- Bachelor's Degree in Computer Science or related field.
- Minimum of five (5) years of professional technical experience.
- Experience with cloud infrastructure tools like Ansible or Terraform.
- Strong programming skills in Go or Python.
- Ability to analyze systems for failure modes and bottlenecks.
Preferred Qualifications:
- Minimum of five (5) years experience as a Site Reliability Engineer or DevOps Engineer.
- Experience designing, analyzing, and troubleshooting distributed systems.
- Experience maintaining Kubernetes clusters in a production environment.
What Pendo Offers:
- Opportunity to work on a high-throughput platform processing billions of events daily.
- Experience with Google Kubernetes Engine (GKE) and other Google technologies.
- Collaboration with a diverse and exciting set of technologies and clients.