Job Description
The Network and Systems Operations Engineer role at a technology company focused on family safety and connectivity. The company operates in a Remote First environment. The NSO Team is part of Cloud Operations, supporting over 325 engineers. The team focuses on observability infrastructure, tooling, L1 service support, and incident management. The role involves monitoring, responding to alerts, and executing runbooks to resolve problems.
Role involves:
- Creating and maintaining observability infrastructure and tooling using Prometheus, Grafana, and Datadog.
- Serving as a member of L1 support, answering pages, and resolving or escalating issues.
- Responding to alerts in PagerDuty and driving incidents to conclusion.
- Coordinating cross-team and cross-functional efforts to ensure operational excellence.
Requirements:
- Bachelor's in Computer Science, Engineering, or related field.
- 5+ years experience writing/debugging code in Java, Python, Shell, or Ruby.
- 5+ years experience with large-scale distributed systems and managing Linux-based systems in AWS.
- Experience with observability and reporting systems (New Relic, Datadog, Elastic, Prometheus, etc.).
- 3+ years experience with Docker, Kubernetes, system virtualization, cloud monitoring, and logging.
- 3+ years experience with IaC and config management tools like Terraform, Cloudformation, Chef, and Ansible.
- Experience working as part of a team with analytical and problem-solving skills.
What role offers:
- Technical and non-technical training.
- Internal conferences and meetups.
- Support and mentorship from an experienced employee.
- Health insurance.
- English courses.
- Sports activities.
- Flexible work options (remote and hybrid).
- Referral program.
- Work anniversary program and additional vacation days.