Browse All Jobs

ClickUp, a fast-growing SaaS company, is seeking a Staff Site Reliability Engineer to enhance the stability, availability, and reliability of its globally distributed, cloud-based infrastructure. The ideal candidate will possess a strong SRE discipline and a passion for tackling complex challenges.

The Staff Site Reliability Engineer’s role involves:

  • Designing and building systems for maximum performance and scalability.
  • Collaborating with engineering teams on product design and troubleshooting.
  • Improving stability, observability, and metrics related to uptime.
  • Championing ClickUp's monitoring infrastructure.
  • Implementing and improving site reliability practices.
  • Responding to and troubleshooting downtime events.

ClickUp requires:

  • 4-6+ years of knowledge of the Amazon Web Services ecosystem ( EC2, ECS, VPC, Redis, RDS, ALB, ECR)
  • Experience working with Kubernetes
  • Experience in managing production-critical infrastructures and DevOps mindset.
  • Be familiar with SRE best practices and procedures.
  • Experience with IaC (CDK, Terraform), CI/CD (GitHub Actions, ArgoCD)
  • Familiar with Containerisation (Docker)
  • Knowledgeable in network, firewall, and security best practices
  • Experience with self-healing automation and monitoring tools (DataDog, CloudWatch)

ClickUp offers:

  • A chance to work in a values-driven company.
  • An environment that supports employees.
  • Opportunity to impact millions of users.
Apply

ClickUp

ClickUp is a rapidly expanding SaaS company with its headquarters in San Diego, providing an all-in-one productivity platform. It unifies project management, document collaboration, whiteboards, spreadsheets, and AI. ClickUp aims to revolutionize how people work, helping millions enhance their productivity and save time. The company values ambition, consistent growth, and challenges the status quo, fostering a culture of hard work and meritocracy.