Site Reliability Engineer (Amsterdam)

Site Reliability Engineer role at Together AI in Amsterdam.

Job Description

Together AI is seeking a Site Reliability Engineer to ensure the smooth operation of user-facing services and production systems. The ideal candidate combines operational skills with software engineering principles, applying automation to enhance operating environments and codebase. This role specializes in systems, implementing best practices for availability, reliability, and scalability.Responsibilities include:

Participating in an on-call (PagerDuty) rotation for incident response.
Building and managing infrastructure using Ansible, Terraform, and Kubernetes.
Developing monitoring systems to maintain high service quality.
Designing and implementing operational processes for deployments and upgrades.
Debugging production issues across all services.
Identifying product architecture improvements for reliability, performance, and availability.
Planning infrastructure growth.

Requirements:

7+ years of SRE or related experience.
Bachelor's degree in Computer Science or related field, or equivalent experience.
Expert knowledge of Ansible, Terraform, and Kubernetes.
Proficiency in programming/scripting languages.
Direct experience in monitoring and observability practices.
Advanced knowledge of cloud services.
Ability to thrive in a collaborative environment.

Together AI offers:

Opportunity to work in a research-driven AI company.
Chance to contribute to open-source research and advancements.
Be part of a passionate team building the next generation AI infrastructure.

Apply Manually

Together AI

All Jobs at Together AI (31)

Clash

of Jobs

Site Reliability Engineer (Amsterdam)

Job Description

Together AI

This feature is not ready yet

Sign up for the newsletter to get notified when it's available

Site Reliability Engineer (Amsterdam)

Job Description

Together AI