Browse All Jobs
Groupon is seeking a Principal Site Reliability Engineer to ensure the performance, availability, and resilience of its platforms. The role involves leading initiatives to redefine operational excellence and collaborating with teams to implement cutting-edge technologies. This is an opportunity to shape the future of platform reliability. Role Involves:
  • Architecting and maintaining fault-tolerant systems with uptime SLAs of 99.9% or higher.
  • Driving automation in infrastructure management and deployment using Terraform, Ansible, and Kubernetes.
  • Creating and optimizing CI/CD pipelines for reliable software delivery.
  • Building comprehensive observability solutions.
  • Collaborating to define SLIs, SLOs, and error budgets.
  • Leading incident response and root cause analysis.
  • Designing and executing performance testing and scalability strategies.
  • Mentoring junior engineers.
  • Guiding architectural decisions.
Requirements:
  • 10+ years in systems engineering, with 5+ years in SRE or DevOps roles.
  • Expertise in cloud platforms (GCP, AWS) and container orchestration (Kubernetes, Docker).
  • Proficiency in programming languages like Python, Go, and Bash.
  • Advanced knowledge of Infrastructure as Code (Terraform, Ansible).
  • Understanding of networking, DNS, load balancing, and security principles.
  • Proven track record of managing high-availability systems.
  • Exceptional analytical and problem-solving skills.
What Groupon Offers:
  • Opportunity to work with cutting-edge technologies.
  • Collaborative and innovative work culture.
  • Professional growth and leadership development pathways.
  • Chance to leave a lasting impact.
Apply

Groupon