Job Description
Snyk is seeking a Senior Site Reliability Engineer to join their team in Lisbon, Portugal. This role involves building scalable, reliable, and secure cloud infrastructure to support the company's hypergrowth. The ideal candidate will ensure the performance and uptime of Snyk's systems while adopting DevOps best practices and leveraging modern tools.
Responsibilities include:
- Designing, deploying, and maintaining infrastructure on AWS.
- Managing Kubernetes clusters across multiple environments.
- Utilizing ArgoCD, Kustomize, and Helm for continuous deployment and GitOps workflows.
- Implementing and managing monitoring and alerting systems using Prometheus, Grafana, and custom exporters.
- Maintaining centralized logging and observability using Graylog and OpenSearch.
- Automating infrastructure provisioning with Terraform and custom scripting.
- Implementing best practices around networking.
- Troubleshooting complex system issues.
- Ensuring high availability, scalability, and disaster recovery.
- Collaborating with development and operations teams.
Requirements:
- Strong hands-on experience with AWS services.
- Deep understanding of Kubernetes architecture.
- Experience with Cloudflare products.
- Proficiency in the Prometheus + Grafana monitoring stack.
- Strong with Calico for managing Kubernetes network policies.
- Solid experience with Graylog and OpenSearch.
- Proficient with Infrastructure as Code tools, especially Terraform, Kustomize and Helm.
- Experience with CI/CD pipelines and GitOps practices using ArgoCD.
- Strong scripting and automation skills in Bash and/or Python.
- Solid knowledge of networking principles.
Snyk offers:
- Flexible working hours and work-from-home allowances.
- Generous vacation and wellness time off.
- Health benefits and employee assistance plans.
- Country-specific life insurance, disability benefits, and retirement/pension programs.