Job Description
GitLab is seeking a Senior Site Reliability Engineer to join its Environment Automation team. This role is crucial for maintaining the smooth operation of GitLab's user-facing services and production systems. The ideal candidate will blend operational pragmatism with software craftsmanship, applying engineering principles and automation to GitLab's environments and codebase.
Responsibilities:
- Automate operational tasks such as package updates and configuration changes.
- Develop and maintain early warning systems for reliable maintenance.
- Plan monitoring and alerting systems to predict capacity needs.
- Respond to user emergencies, platform alerts, and support requests.
- Enhance security measures for GitLab infrastructure.
- Partner with internal and external compliance assessors.
- Collaborate with engineering stakeholders to resolve architectural bottlenecks.
Requirements:
- Experience with Infrastructure as Code technologies, especially Terraform.
- Ability to reason about large systems and their operation at scale.
- Comfortable using GoLang or Ruby, with experience in GoLang.
- Experience interacting with customers and resolving their requests.
GitLab offers:
- Opportunity to work on core GitLab projects.
- Chance to code infrastructure automation with Ansible and Terraform.
- Exposure to GitLab's observability stack (e.g., ELK, Prometheus).
- Interaction with various cloud provider systems (e.g., GCP, AWS).