Browse All Jobs
Job Description

Zafin is seeking a Cloud Site Reliability Engineer II to lead strategic initiatives in ensuring the reliability, scalability, and performance of its cloud infrastructure and applications. This role requires mastery in cloud technologies, strategic planning, and incident management to drive innovative solutions and operational excellence. The CSRE II will influence the direction of cloud reliability strategies, mentor junior engineers, and lead significant projects that have a broad organizational impact.

Role Involves:

  • Leading and managing the resolution of complex technical issues involving Zafin’s products and Azure cloud environment.
  • Designing and implementing strategic operational enhancements to improve resiliency and system reliability.
  • Conducting in-depth Root Cause Analysis (RCA) for high-severity incidents and drive initiatives to reduce error recurrence.
  • Representing the organization in external client escalation calls, providing expert guidance and solutions.
  • Architecting and optimizing cloud infrastructure for high performance, scalability, and cost-effectiveness.
  • Providing thought leadership in managing and scaling container orchestration platforms such as AKS and OpenShift.
  • Overseeing the implementation of advanced monitoring solutions and integrate predictive analytics for proactive issue resolution.
  • Developing and executing automation strategies to streamline operational workflows and incident responses.
  • Creating and maintaining comprehensive documentation of cloud architectures, processes, and incident management strategies.
  • Mentoring and coaching junior engineers, fostering a culture of continuous learning and innovation.
  • Driving strategic initiatives, collaborating with cross-functional teams to achieve organizational objectives.

Requirements:

  • Bachelor’s degree in Computer Science, Engineering, or a related field (Master’s degree preferred).
  • 12+ years of experience in cloud support, operations, or a related role.
  • Advanced expertise in Microsoft Azure (preferred) or equivalent cloud platforms.
  • Demonstrated experience in designing and scaling container orchestration systems like AKS or OpenShift.
  • Proven leadership in managing automated deployment pipelines, including Azure DevOps.
  • Mastery in enterprise monitoring platforms (e.g., Azure Insights, Grafana) and predictive analytics tools.
  • Advanced scripting skills with PowerShell, Python, or similar languages.
  • Extensive experience in incident management and defining SLAs for global production environments.
  • In-depth knowledge of database management, particularly Postgres.

What Zafin Offers:

  • Competitive salaries.
  • Annual bonus potential.
  • Generous paid time off.
  • Paid volunteering days.
  • Wellness benefits.
  • Robust opportunities for professional growth and career advancement.
Apply Manually