Browse All Jobs
Job Description

Groupon is seeking a Production Service Support Engineer (Incident Management) to join their team. This role is crucial for supporting and optimizing the internal systems that bridge the business and engineering departments. The ideal candidate will leverage Site Reliability Engineering best practices and ITIL Solutions Architecture framework to devise incident management strategies.

As an Incident Commander, the engineer will be responsible for preventing, identifying, triaging, documenting, investigating, mitigating, and recovering from site/service impacting incidents across Groupon’s globally dispersed services. They will also facilitate the coordination and resolution of Post Mortems and oversee Problem Management.

The role offers dedicated project time for engaging projects and involves working within the Incident Management team on a Monday-Friday shift, with one weekend primary on-call every 6 weeks.

Responsibilities include:

  • Leveraging Site Reliability Engineering best practices and ITIL Solutions Architecture framework to devise incident management strategies.
  • Preventing, identifying, triaging, documenting, investigating, mitigating, and recovering from site/service impacting incidents.
  • Facilitating the coordination and resolution of Post Mortems.
  • Overseeing Problem Management.

Requirements:

  • 4+ years administering Linux system environments.
  • 4+ years experience with web applications operations and root cause analysis.
  • 4+ years of experience creating Splunk or Kibana search queries.
  • 4+ years of experience developing policies and procedures that improve overall production stability.
  • Good communication, consulting, and collaboration skills.
  • Experience with one or more programming languages (Python, Ruby, Java).

Groupon offers:

  • A culture that inspires innovation, rewards risk-taking, and celebrates success.
  • Autonomy and the opportunity to make a meaningful impact.
Apply Manually