Browse All Jobs
Job Description
xAI is seeking an Infrastructure Engineer to manage some of the world’s largest GPU supercomputing clusters for AI training and serving production models. The ideal candidate will implement IaC best practices, enhance deployment pipelines, and ensure robust, secure service delivery across production environments, working with both on-premise clusters and cloud providers. This role also focuses on helping with security best practices for internal researchers and live external traffic.

Responsibilities:
  • Operating GPU supercomputing clusters for AI training and serving.
  • Implementing Infrastructure as Code (IaC) best practices.
  • Enhancing deployment pipelines.
  • Ensuring robust, secure service delivery.
  • Working with on-premise clusters and cloud providers.
  • Assisting with security best practices.
Requirements:
  • Experience writing scalable and highly available containerized applications in Rust.
  • Experience managing compute fleets with Pulumi, Terraform, Ansible, or other stateful automation libraries.
xAI offers:
  • Opportunity to work on large-scale GPU supercomputing clusters.
  • A flat organizational structure.
  • A challenging and engaging environment.
Apply Manually

xAI

xAI is an artificial intelligence company focused on building AI systems that deeply understand the universe and assist humanity in its quest for knowledge. It operates with a flat organizational structure that values engineering excellence, curiosity, and strong communication. xAI fosters a collaborative environment where every team member contributes directly to the company’s objectives, with a focus on continuous improvement.

All Jobs at xAI (129)