Job Description
GitLab is seeking a Senior Site Reliability Engineer to join their Runway team. This team is focused on building a next-generation platform for rapidly deploying backend services. The ideal candidate will collaborate with cross-functional teams to ensure systems are reliable, scalable, and performant. This role involves designing, implementing, and maintaining infrastructure on both GCP and AWS.
Role involves:
- Designing, implementing, and maintaining infrastructure on both GCP and AWS
- Creating and maintaining Kubernetes tooling, logging, secrets management, and utilities
- Building and improving monitoring, alerting, and logging systems
- Participating in on-call rotation to address critical issues
- Automating manual processes to increase efficiency and reduce errors
- Leading incident response, including postmortem analysis
- Contributing to capacity planning and cost optimization
Requirements:
- 5+ years of experience in DevOps, SRE, or similar roles
- Strong experience with both GCP and AWS cloud platforms
- Proficiency with Kubernetes and container orchestration
- Solid programming skills in Golang and scripting languages
- Experience designing and implementing logging solutions
- Demonstrated ability to automate infrastructure operations
- Experience with on-call rotations and incident management
- Strong troubleshooting and problem-solving skills
- Excellent communication skills and ability to work in a team
- Comfortable in a fully remote, heavily asynchronous environment across AMER, EMEA, and APAC regions
GitLab offers:
- Benefits to support your health, finances, and well-being
- All remote, asynchronous work environment
- Flexible Paid Time Off
- Team Member Resource Groups
- Equity Compensation & Employee Stock Purchase Plan
- Growth and development budget
- Parental leave
- Home office support