Job Description
Verisign is seeking a highly skilled Mid-level Site Reliability Engineer (SRE) to join their team and play a critical role in ensuring the stability, performance, and security of their data platforms. The ideal candidate will have a deep understanding of big data systems and automation, be proficient in Infrastructure-as-Code and CI/CD, and possess a strong desire to learn.This role involves:
- Architecting, deploying, and managing large-scale data platforms (Kafka, Spark, Hadoop, Druid) running on top of Kubernetes
- Automating cluster provisioning (CICD), scaling and monitoring using Ansible, Python and Jenkins
- Participating in technical designs for software solutions that combine Open-Source, Commercial and custom developed components
- Ensuring platform SLOs by collecting, visualizing, and alerting on relevant telemetry
- Upgrading large-scale data platforms improving system capabilities and security while ensuring minimal customer impact
- Troubleshooting complex issues in large and distributed environments
- Staying up to date with the industry data platform best practices and standards, focusing on hybrid cloud environments
- Supporting data platform customers
- Participating in the on-call rotation monitoring production systems and responding to incidents
Requirements include:
- Bachelor’s degree in computer science or a related technical field, or equivalent combination of education and experience
- 5+ years of experience managing big data platforms (Hadoop, Spark, Kafka, Druid)
- Excellent understanding of Linux configuration and administration
- Strong automation experience
- Strong understanding of infrastructure-as-code such as Ansible
- Experience with Docker or Kubernetes in a production environment
- Strong written and verbal communication skills
Verisign offers:
- A dynamic and flexible work environment
- Competitive benefits
- The ability to grow your career