Job Description
Zafin is seeking a Cloud Site Reliability Engineer I to ensure the seamless operation, support, and maintenance of its cloud infrastructure and applications. The role reports to the VP of Cloud Services and involves enhancing system reliability, scalability, and performance. The CSRE I will collaborate with cross-functional teams to ensure exceptional service delivery to clients and stakeholders.
Responsibilities:
- Act as a level-3 technical support expert for Zafin products and Azure cloud issues.
- Collaborate with Product, Platform Engineering, and DevOps teams to introduce operational enhancements and resiliency measures.
- Conduct Root Cause Analysis (RCA) for Severity 1 and 2 incidents.
- Participate in external client escalation calls, providing technical insights and solutions.
- Optimize cloud infrastructure for scalability, performance, and cost-effectiveness.
- Manage container orchestration platforms such as Azure Kubernetes Service (AKS) or OpenShift.
- Enhance monitoring and tracking tools to proactively detect and resolve issues.
- Collaborate with internal teams to implement best practices for Azure cloud deployment and configuration.
- Develop automation scripts for routine operational tasks, incident responses, and cloud cost optimization.
- Maintain detailed documentation of processes, incidents, and cloud architecture.
- Participate in a rotating on-call schedule to ensure 24/7 availability for critical incidents.
Requirements:
- Bachelor’s degree in Computer Science, Engineering, or a related field.
- 8+ years of experience in cloud support, operations, or a related role.
- Hands-on experience with Microsoft Azure (preferred) or other cloud platforms.
- Proficiency in container orchestration platforms like AKS or OpenShift.
- Expertise in automated deployment pipelines, particularly Azure DevOps.
- Familiarity with enterprise monitoring platforms such as Azure Insights, Grafana, or Site24/7.
- Proficiency in scripting languages like PowerShell or Python.
- Proven experience in incident management and maintaining SLAs for critical production environments.
- Knowledge of Postgres databases.
What Zafin Offers:
- Competitive salaries.
- Annual bonus potential.
- Generous paid time off.
- Paid volunteering days.
- Wellness benefits.
- Opportunities for professional growth and career advancement.