Job Description
Nozomi Networks is seeking an Observability Engineer to join their Site Reliability Engineering team. This role is based in Mendrisio at the EU Headquarters. The successful candidate will be responsible for the availability, performance, monitoring, and incident response of the company's cloud-based services. They will drive a culture of observability and collaborate with engineering teams to maintain standardized practices.
Role involves:
- Building and driving a culture of observability
- Building or adopting monitoring tools and automation
- Collaborating with engineering teams on observability practices
- Identifying issues and bottlenecks proactively
- Measuring services from development to live environments
- Defining and implementing Service Level Objectives
- Scaling the team and participating in on-call duties
- Driving post-incident analysis
Requirements:
- Proven experience in observability engineering with distributed systems
- Proficiency with OpenTelemetry
- Experience with Grafana LGTM stack
- Experience with infrastructure as code tools like Terraform or CloudFormation
- Ability to operate with strong confidentiality and data privacy protocols
- Good communication skills
- Hands-on experience with AWS monitoring and observability tools
- Good understanding of Kubernetes and related monitoring
- Experience with feature flagging and progressive delivery
- Chaos engineering experience
Nozomi Networks offers:
- Health & Wellness benefits
- Financial benefits
- Work-Life Balance benefits
- Flexible Time-Off