The Site Reliability Engineering (SRE) team architects, builds, and maintains the infrastructure that applications rely on. This role involves close collaboration with development teams to ensure scalability, reliability, and efficiency.
What the role involves: - Deploy, automate, maintain, and manage various cloud-based and on-premises production systems.
- Document new and existing requirements to ensure smooth project delivery.
- Work with security teams to adopt security best practices.
- Ensure the availability, performance, scalability, and security of production systems.
- Troubleshoot and resolve system issues.
- Suggest architectural improvements and recommend process optimizations.
- Evaluate new technologies to enhance the infrastructure stack.
- Implement automated provisioning and scaling of servers.
- Handle operational tasks, including on-call duties, alerts, and incident management.
Requirements: - Minimum 2 years of engineering experience.
- Bachelor’s or Master’s degree in a relevant field (e.g., IT, Computer Science) or proven DevOps track record.
- Willingness to continuously upgrade skills and stay up-to-date with DevOps trends.
- Experience with cloud-native tools (e.g., Kubernetes, Docker, Nginx, OpenTelemetry) is a plus.
- Experience managing cloud servers (AWS, GCP).
- Experience with on-premises physical servers, databases, and storage solutions (MySQL, PostgreSQL, Redis) is a plus.
- Familiarity with Infrastructure as Code (IaC) tools (Terraform, Pulumi).
What the role offers: - Opportunity to work on a rock-solid infrastructure.
- Chance to collaborate with development teams.
- Exposure to cloud-native tools and technologies.