Job Description
Altana is seeking a Staff Site Reliability Engineer to ensure the availability, performance, and scalability of its critical production services. This role is crucial for embedding reliability into Altana's architecture and operations through automation and proactive monitoring. The Staff Site Reliability Engineer will work closely with engineering teams to influence system design and contribute to the development of robust infrastructure.
Responsibilities include:
- Championing and implementing SRE principles.
- Designing and maintaining advanced monitoring solutions.
- Automating repetitive operational tasks.
- Participating in incident response and leading postmortems.
- Collaborating with development teams to optimize system design.
- Participating in an on-call rotation.
- Maintaining reliability for data pipelines.
Requirements include:
- 5+ years of experience in SRE, DevOps, or a similar role.
- Strong understanding of SRE principles.
- Expertise in observability platforms.
- Proficiency in a programming/scripting language.
- Experience with cloud platforms (AWS, Azure, or GCP).
- Experience with containerization technologies (Docker, Kubernetes).
- Experience with Infrastructure as Code (IaC) tools.
- Proven experience in incident management.
- Knowledge of microservices architectures and CI/CD pipelines.
- Excellent problem-solving skills.
- Strong communication skills.
- Experience with data engineering concepts.
Altana offers:
- Flexible Time Off
- Paid Parental Leave
- Health Benefits
- Supplemental Benefits
- 401(k) Savings
- Commuter Benefits
- Wellness programs
- Pet Insurance
- Employee Assistance Program
- Dependent Care FSA