Browse All Jobs
Arcesium is seeking a Senior Site Reliability Engineer (SRE) to join its Platform Site Reliability Engineering (PSRE) team. This role is crucial for maintaining the stability, reliability, and availability of the company's mission-critical production applications. The Senior SRE will be instrumental in incident management, proactive monitoring, and problem-solving within a high-pressure environment where rapid resolution is essential.

The role involves:
  • Serving as a primary contact for incidents and critical issues, driving effective communication and swift resolution.
  • Continuously monitoring application and infrastructure health, analyzing trends, and proactively implementing preventative measures.
  • Troubleshooting complex technical issues across the stack, identifying root causes, and implementing effective solutions.
  • Collaborating with engineering, development, and operations teams to ensure seamless incident response and proactive reliability initiatives.
  • Automating tasks, improving operational efficiency, and enhancing system resilience.
  • Contributing to the ongoing development and improvement of SRE practices, tools, and processes.

Requirements:
  • Up to 5 years of experience in an SRE, DevOps, or Production Engineering role.
  • Deep understanding of SRE principles and best practices.
  • Incident management expertise.
  • Proficiency in Python or Java.
  • Hands-on experience with Kubernetes (K8s).
  • Cloud experience (AWS preferred) with services like EC2, S3, Lambda, and CloudWatch.
  • Excellent communication skills.
  • Strong troubleshooting skills.
  • Ability to stay calm under pressure and prioritize effectively.
  • Fluency in English.
  • Legal right to work in the country.

Arcesium Offers:
  • Opportunity to impact business-critical operations.
Apply

Arcesium LLC