Job Description
Nozomi Networks is seeking an Observability Engineer to join their Site Reliability Engineering team. The successful candidate will be responsible for the availability, performance, monitoring, and incident response of the company's cloud-based services. This role involves building and driving a culture of observability, creating monitoring tools, collaborating with engineering teams, identifying issues, measuring service performance, defining Service Level Objectives, and participating in on-call duties.
Responsibilities: - Build and drive a culture of observability within the whole company
- Build or adopt monitoring tools and automation to increase the efficiency of our teams
- Collaborate with our engineering teams in building and maintaining standardised observability practices and instrumentation
- Proactively identify issues and bottlenecks before they happen
- Measure everything, we want to observe our services starting from local development up to CI/CD pipelines and live environments
- Service Level Objectives definition and implementation
- Scaling the team collaborating in finding and hiring new peers when needed
- Participate in periodic on-call rotational duties
- Drive post-incident analysis according to our no-blame culture
- Embody the Nozomi Networks Cultural Pillars and our mission to protect what matters most with transparency and trust
Requirements: - Proven professional experience into observability engineering with distributed systems
- OpenTelemetry is must. You can easily build, understand, and debug complex collectors pipelines
- Previous experience and knowledge with Grafana LGTM stack
- Experience defining infrastructure as code using tools such as Terraform or CloudFormation
- Ability to operate in settings with strong confidentiality and data privacy protocols
- Good written and verbal communication skills
- Hands-on experience with AWS, in particular with their monitoring and observability tools
- Good understanding of Kubernetes and related monitoring
- Experience with feature flagging and progressive delivery
- Chaos engineering experience
Benefits: - Health & Wellness programs
- Financial benefits
- Work-Life Balance initiatives
- Flexible Time-Off