Browse All Jobs

Groupon is seeking a Principal Site Reliability Engineer to ensure the performance, availability, and resilience of its platforms. The successful candidate will lead initiatives that redefine operational excellence, collaborate with diverse teams, and mentor other engineers. This role offers an exceptional opportunity to solve complex challenges and shape the future of platform reliability.

As a Principal Site Reliability Engineer at Groupon, the candidate will:

  • Architect and maintain fault-tolerant systems
  • Drive automation in infrastructure management and deployment
  • Create and optimize CI/CD pipelines
  • Build and enhance comprehensive observability solutions
  • Collaborate with stakeholders to define and achieve SLIs, SLOs, and error budgets
  • Lead incident response during on-call rotations
  • Design and execute performance testing and capacity planning
  • Proactively identify and resolve bottlenecks
  • Mentor junior engineers
  • Guide architectural decisions that enhance system reliability

The ideal candidate must possess:

  • 10+ years in systems engineering, with at least 5+ years in SRE or DevOps roles
  • Expertise in cloud platforms (GCP, AWS) and container orchestration (Kubernetes, Docker)
  • Proficiency in programming and scripting languages like Python, Go, and Bash
  • Advanced knowledge of Infrastructure as Code (IaC) tools such as Terraform and Ansible
  • Deep understanding of networking, DNS, load balancing, and security principles
  • Proven track record of managing high-availability systems in demanding environments
  • Exceptional analytical and problem-solving skills

Groupon offers:

  • The opportunity to work with cutting-edge technologies
  • A collaborative and innovative work culture
  • Professional growth and leadership development pathways
  • A chance to leave a lasting impact
Apply

Groupon