The Site Reliability Engineering team is responsible for designing, building, and maintaining the robust infrastructure that applications depend on. The SRE works closely with development teams to guarantee scalability, reliability, and efficiency, enabling exceptional customer experiences and developer focus on feature building.
What This Role Involves:
- Deploying, automating, maintaining, and managing cloud-based and on-prem production systems.
- Understanding architecture and documenting requirements for smooth project delivery.
- Collaborating with security and infrastructure teams to adopt security best practices.
- Ensuring availability, performance, scalability, and security of production systems.
- System troubleshooting and problem-solving.
- Suggesting architecture and process improvements.
- Evaluating new technologies to improve the infrastructure stack.
- Ensuring system security policies are properly remediated.
- Driving and implementing auto-provisioning and scaling using automation tools.
- Handling operational tasks such as on-call duties and incident management.
Requirements:
- Minimum of 2 years of professional experience in engineering.
- Bachelor's or Master's degree in a relevant field (Information Technology, Computer Science) or equivalent DevOps experience.
- Proficiency in working with cloud servers, AWS and GCP.
- Experience with cloud-native tools (Kubernetes, Docker, Nginx, OpenTelemetry).
- Proficiency in Linux system management.
- Familiarity with databases and storage solutions (MySQL, Postgres, Redis) is beneficial.
- Interest in engineering management is a plus.