Browse All Jobs
Job Description
ABBYY is seeking a Site Reliability Engineering (SRE) Manager to lead its distributed SRE team. The ideal candidate will be responsible for ensuring the reliability, scalability, and performance of ABBYY's production systems, fostering a culture of operational excellence and continuous improvement. This role involves collaborating with engineering, product, and infrastructure teams to design and implement robust systems, drive incident response, and lead initiatives that improve system reliability and developer productivity.

Role involves:
  • Leading and mentoring a team of SREs.
  • Defining and driving the SRE roadmap.
  • Overseeing incident management processes.
  • Collaborating with engineering teams.
  • Implementing and maintaining observability tools.
  • Championing best practices in infrastructure as code.
  • Driving cost optimization and performance tuning.
  • Reporting on system reliability metrics.

Requirements:
  • 5+ years of experience in Site Reliability Engineering, DevOps, or related fields.
  • 2+ years of experience in a leadership or managerial role.
  • Proven experience managing technical teams of 10 or more individuals.
  • Hands-on experience with both AWS and Azure cloud platforms.
  • Demonstrated experience in defining and implementing SLIs and SLOs.
  • Proficiency in infrastructure as code tools (Terraform, CloudFormation, etc.).
  • Experience with container orchestration (Kubernetes, ECS, etc.).
  • Solid understanding of monitoring and observability tools (Prometheus, Grafana, Datadog, etc.).
  • Strong scripting or programming skills (Python, Go, Bash, etc.).
  • Excellent communication and collaboration skills.

ABBYY offers:
  • Remote and hybrid working options.
  • Flexible hours.
  • Paid volunteering days.
  • Paid parental leave.
Apply Manually