Job Description
Reddit is seeking a Senior Site Reliability Engineer to enhance the reliability and performance of its engineering platforms and services. The successful candidate will work within Reddit's Infrastructure SRE team, collaborating closely with Compute, Traffic, and Observability infrastructure teams. This role involves owning a suite of tools, primarily based on open-source solutions like Prometheus, Thanos, and Grafana, to help engineers understand their creations at scale. The engineer will also take ownership of risk management, ensuring system resilience and performance.Responsibilities include:
- Advising engineering teams on designing resilient and high-performance systems.
- Amplifying capabilities into foundational Infrastructure and Platform services.
- Automating repetitive, manual, or risky tasks.
- Diagnosing and fixing network, system, and service-level issues.
- Optimizing performance, reducing cost, and improving user experience.
Requirements:
- 5+ years of experience in Software Engineering, Site Reliability Engineering, or a development-focused DevOps role.
- Proficiency in one or more programming languages (Go and Python preferred).
- Experience with Kubernetes and Cloud systems.
- Familiarity with distributed systems development and tools like Prometheus, Thanos, and Grafana.
- Experience with high-traffic backend systems.
- Strong debugging, troubleshooting, and optimization skills.
- Strong working knowledge of Linux and containers.
- Excellent communication and collaborative skills.
Reddit offers:
- Pension Scheme
- Private Medical and Dental Scheme
- Life Assurance, Income Protection
- Workspace benefit for your home office
- Personal & Professional development funds
- Family Planning Support
- Commuter Benefits
- Flexible Vacation & Reddit Global Days Off