Browse All Jobs
Job Description
Pendo is seeking a Site Reliability Engineer to join its SRE team. The SRE team is responsible for provisioning and maintaining cloud infrastructure from development through production for all product initiatives. The ideal candidate will work with developers and product managers to ensure that Pendo's products are reliable, performant, and cost-efficient.Role Responsibilities:
  • Write high-quality infrastructure-as-code to automate infrastructure provisioning, deployment, and scaling.
  • Write maintainable code for product functionality, emphasizing operations, scale, resiliency, and monitoring.
  • Collaborate with engineers to ensure new services are well-designed and monitored.
  • Debug production issues, mitigate them quickly, and find ways to prevent them.
  • Maintain runbooks for manual tasks and automate them.
  • Proactively track capacity, quotas, and performance limits.
  • Participate in a 24x7 on-call rotation.
Minimum Qualifications:
  • Experience with cloud infrastructure tools like Ansible or Terraform.
  • Programming skills in Go or Python.
  • Ability to think about systems in terms of failure modes and bottlenecks.
  • Ability to write clear documentation for incident runbooks and release processes.
  • Good number sense for performance, cost, and operational metrics analysis.
Preferred Qualifications:
  • Experience designing, analyzing, and troubleshooting distributed systems.
  • Experience maintaining Kubernetes clusters in production.
  • Previous experience as a Site Reliability Engineer, DevOps Engineer, or similar role.
Pendo offers:
  • Opportunity to work with a diverse and exciting set of technologies and clients.
Apply Manually