Job Description
Groupon is seeking a Production Service Support Engineer (Incident Management) to join their team. This role is crucial for supporting and optimizing the internal systems that span across the business and engineering departments. The ideal candidate will leverage Site Reliability Engineering best practices and ITIL Solutions Architecture framework to devise incident management strategies.This individual will act as an Incident Commander, change manager, and a senior technical resource, taking responsibility for preventing, identifying, triaging, documenting, investigating, mitigating, and recovering from site/service impacting incidents across Groupon’s globally dispersed services. They will also facilitate the coordination and resolution of Post Mortems through best practices and oversee Problem Management.
Responsibilities include: - Leveraging Site Reliability Engineering best practices and ITIL Solutions Architecture framework to devise incident management strategies.
- Preventing, identifying, triaging, documenting, investigating, mitigating, and recovering from site/service impacting incidents.
- Facilitating the coordination and resolution of Post Mortems.
- Overseeing Problem Management.
- Working on engaging projects.
Requirements: - 4+ years administering Linux system environments.
- 4+ years experience with web applications operations and root cause analysis.
- 4+ years of experience creating Splunk or Kibana search queries.
- 4+ years of experience developing policies and procedures that improve overall production stability.
- Good communication, consulting, and collaboration skills.
- Experience with one or more programming languages (Python, Ruby, Java).
Groupon offers: - Dedicated project time to work on interesting projects.
- Opportunity to work as part of the Incident Management team.