Browse All Jobs

Groupon is seeking a Principal Site Reliability Engineer to ensure the performance, availability, and resilience of its platforms. The role involves leading initiatives that redefine operational excellence and collaborating with diverse teams to implement cutting-edge technologies and best practices. The ideal candidate will foster a culture of reliability and mentor other engineers.

The Principal Site Reliability Engineer will be responsible for:

  • Architecting and maintaining fault-tolerant systems.
  • Driving automation in infrastructure management and deployment.
  • Creating and optimizing CI/CD pipelines.
  • Building and enhancing comprehensive observability solutions.
  • Collaborating with stakeholders to define SLIs, SLOs, and error budgets.
  • Leading incident response.
  • Designing and executing performance testing, capacity planning, and scalability strategies.
  • Proactively identifying and resolving bottlenecks.
  • Mentoring junior engineers.
  • Guiding architectural decisions.

The position requires:

  • 10+ years in systems engineering, with at least 5+ years in SRE or DevOps roles.
  • Expertise in cloud platforms (GCP, AWS) and container orchestration (Kubernetes, Docker).
  • Proficiency in programming and scripting languages like Python, Go, and Bash.
  • Advanced knowledge of Infrastructure as Code (IaC) tools such as Terraform and Ansible.
  • Deep understanding of networking, DNS, load balancing, and security principles.
  • Proven track record of managing high-availability systems in demanding environments.
  • Exceptional analytical and problem-solving skills.

Groupon offers:

  • The opportunity to work with cutting-edge technologies.
  • A collaborative and innovative work culture.
  • Professional growth and leadership development pathways.
Apply

Groupon