Browse All Jobs
Job Description
Verisign is seeking a highly skilled Mid-level Site Reliability Engineer (SRE) to join their team and play a critical role in ensuring the stability, performance, and security of their data platforms. The ideal candidate will have a deep understanding of big data systems and automation, be proficient in Infrastructure-as-Code and CI/CD, and possess a strong desire to learn.This role involves:
  • Architecting, deploying, and managing large-scale data platforms (Kafka, Spark, Hadoop, Druid) running on top of Kubernetes
  • Automating cluster provisioning (CICD), scaling and monitoring using Ansible, Python and Jenkins
  • Participating in technical designs for software solutions that combine Open-Source, Commercial and custom developed components
  • Ensuring platform SLOs by collecting, visualizing, and alerting on relevant telemetry
  • Upgrading large-scale data platforms improving system capabilities and security while ensuring minimal customer impact
  • Troubleshooting complex issues in large and distributed environments
  • Staying up to date with the industry data platform best practices and standards, focusing on hybrid cloud environments
  • Supporting data platform customers
  • Participating in the on-call rotation monitoring production systems and responding to incidents
Requirements include:
  • Bachelor’s degree in computer science or a related technical field, or equivalent combination of education and experience
  • 5+ years of experience managing big data platforms (Hadoop, Spark, Kafka, Druid)
  • Excellent understanding of Linux configuration and administration
  • Strong automation experience
  • Strong understanding of infrastructure-as-code such as Ansible
  • Experience with Docker or Kubernetes in a production environment
  • Strong written and verbal communication skills
Verisign offers:
  • A dynamic and flexible work environment
  • Competitive benefits
  • The ability to grow your career
Apply Manually