GitLab is seeking an Intermediate Site Reliability Engineer, Observability to join their team. This role is fully remote and based in Canada. The Observability Team's mission is to build, run, and own the entire lifecycle of the suite of services that enable observability of the GitLab SaaS environments. This position involves being part of an on-call rota.
Responsibilities:
Take a Platform-first approach to solving problems.
Maintain metrics environment and related tools and processes.
Develop monitoring and alerting systems for capacity planning.
Respond to incidents as part of an on-call rotation.
Act as a Subject Matter Expert for metrics gathering, observability guidelines, and capacity planning.
Collaborate with engineering stakeholders to resolve architectural bottlenecks.
Requirements:
Experience with Infrastructure as Code technologies and libraries powering GitLab.
Experience with Grafana’s LGTM stack or Elastic’s stack (ELK).
Ability to reason about large systems and their operation at scale.
Enjoy working with peers and collaborating across teams.
Ability to leverage GitLab as a primary tool.
Share GitLab's values.
GitLab Offers:
All remote, asynchronous work environment
Flexible Paid Time Off
Equity Compensation & Employee Stock Purchase Plan