Job Description
Talkdesk is seeking a Senior Site Reliability Engineer I to enhance Developer Experience. The role involves designing, building, and maintaining high-performance, scalable, and reliable services. The engineer will play a critical role in ensuring teams have the tools, practices, and expertise for software development in a blame-free culture. The mission is to improve developers’ experience by providing tools to manage the entire software lifecycle.
Talkdesk is building its own internal PaaS using technologies like Kubernetes, Prometheus, and Kotlin, which is a vital part of Talkdesk’s engineering efforts.
Responsibilities:
- Design, build, harden, and maintain key parts of the internal platform.
- Help migrate to industry-leading CICD tools like GitHub Actions.
- Automate safe deployment practices using tools like ArgoCD, Argo Rollouts, and Helm Charts.
- Automate infrastructure provisioning and other engineering processes.
- Coach and up-skill other engineering team members.
- Solve challenging technical problems and automate infrastructure.
- Develop effective tooling, alerts, and responses to address reliability risks.
- Drive and promote protocols on production readiness and operational excellence.
- Partner with product engineering teams to debug production outages.
- Plan for the growth of Talkdesk’s infrastructure.
Skills and Qualifications:
- 5-8 years of experience.
- Understand large-scale complex systems from a reliability perspective.
- Experience with Kubernetes.
- Experience with Infrastructure as code tools like Terraform and Ansible.
- Experience building software with a programming language such as Java, Kotlin, Scala or any other JVM-based languages.
- Experience writing scripts for automating the execution of certain tasks with a programming language like Ruby, Python, Bash or any other scripting language.
- Experience with at least one relational and non-relational databases (ex: PostgreSQL, MySQL, MongoDB, Redis, ElasticSearch).
Nice to haves / Pluses:
- Experience with cloud-based solutions such as Amazon AWS, Google Cloud, or Microsoft Azure.
- Experience with CI/CD platforms (e.g Jenkins, GitlabCI), Containers (Docker, Kubernetes), Artifact Management tools (e.g: Nexus, ECR).
- Experience with Go programming language.