Browse All Jobs
Job Description

xAI is seeking a Site Reliability Engineer to join their team responsible for the backend services powering grok.com and its API. The company's mission is to create AI systems that accurately understand the universe and aid humanity. xAI operates with a flat organizational structure where all employees are expected to be hands-on and contribute directly to the company’s mission.

The Site Reliability Engineer will work with a team primarily based in London, with a growing presence in Palo Alto. The team focuses on writing highly scalable and reliable services that can efficiently process tens of thousands of queries per second, hosted on Kubernetes clusters both on-premise and in the cloud.

Responsibilities include:

  • Maintaining and improving the reliability and scalability of backend services.
  • Working with Kubernetes clusters.
  • Monitoring and troubleshooting production issues.
  • Collaborating with other engineers to improve the overall system architecture.

Requirements:

  • Expert knowledge of Kubernetes.
  • Expert knowledge of continuous deployment systems such as Buildkite and ArgoCD.
  • Expert knowledge of monitoring technologies such as Prometheus, Grafana, and PagerDuty.
  • Expert knowledge of infrastructure as code technologies such as Pulumi or Terraform.

xAI offers:

  • Competitive cash-based compensation.
  • xAI equity.
  • Private health and dental insurance.
Apply Manually

xAI

xAI is an artificial intelligence company focused on building AI systems that deeply understand the universe and assist humanity in its quest for knowledge. It operates with a flat organizational structure that values engineering excellence, curiosity, and strong communication. xAI fosters a collaborative environment where every team member contributes directly to the company’s objectives, with a focus on continuous improvement.

All Jobs at xAI (129)