Job Description
BigID is seeking a Site Reliability Engineer to join their Engineering team in Hyderabad. The ideal candidate will be responsible for monitoring and responding to system alerts, with experience in tools such as Datadog. They should also be proficient in efficiently analysing logs across various dashboards.Role involves:
- Implementing comprehensive service metrics to track and report on system reliability, performance, and efficiency
- Monitoring system performance, identify bottlenecks, and execute pipeline optimization
- Collaborating with Scrum teams and other stakeholders to identify potential risks
- Conducting post-incident reviews to prevent recurrence and refine the system reliability framework
Requirements:
- A bachelor's or master's degree in computer science, information systems, or a related technical field
- Between 4- 7 years of experience as a Site Reliability Engineer
- Proficiency in programming languages such as Python, Go, or Java
- In-depth understanding of operating systems, networking, and cloud services
- Experience with monitoring tools (for example, Datadog, ELK, Redash)
- Proven experience in managing large-scale distributed systems and understanding the principles of scalability and reliability
- Familiarity with DevOps culture and practices, and experience with CI/CD systems
- Excellent diagnostic and problem-solving skills, with the ability to analyze complex systems and data
- Certifications in cloud services, networking, or systems administration - Advantage
Role offers:
- Equity participation
- Hybrid work
- Opportunities for professional growth
- Team fun & company outings
- Statutory benefits and leave benefits
- Health Insurance coverage