Browse All Jobs
Job Description
xAI is seeking a Storage Systems Engineer to join its Supercomputing team. This role involves designing, building, and optimizing high-performance storage systems for large GPU supercomputing clusters that support AI training and inference workloads. The ideal candidate will ensure extreme reliability, scalability, and low-latency data access.Role involves:
  • Architecting and implementing distributed storage solutions for massive AI workloads.
  • Optimizing storage performance for high-throughput and low-latency access.
  • Collaborating with infrastructure teams to enhance deployment pipelines using Infrastructure-as-Code (IaC).
  • Monitoring and maintaining storage systems across on-premise clusters and cloud environments.
  • Contributing to capacity planning and data durability strategies.
Requirements:
  • Experience designing and operating distributed storage systems (e.g., Ceph, Lustre, or ZFS) at scale.
  • Hands-on experience with storage hardware (NVMe, SSD, HDD) and tuning I/O performance.
  • Proficiency in writing scalable, high-performance code in Rust or Go.
  • Experience managing storage infrastructure with IaC tools like Pulumi, Terraform, or Ansible.
  • Familiarity with Kubernetes storage primitives and integrating storage with containerized workloads.
xAI offers:
  • Opportunity to work on cutting-edge AI infrastructure.
  • A flat organizational structure with opportunities for leadership.
  • A collaborative and motivated team environment.
Apply Manually

xAI

xAI is an artificial intelligence company focused on building AI systems that deeply understand the universe and assist humanity in its quest for knowledge. It operates with a flat organizational structure that values engineering excellence, curiosity, and strong communication. xAI fosters a collaborative environment where every team member contributes directly to the company’s objectives, with a focus on continuous improvement.

All Jobs at xAI (129)