Browse All Jobs
Job Description
xAI is seeking a Hardcore Engineer to join their pre-training infrastructure team. This role involves designing, building, and implementing large-scale distributed training systems, profiling and optimizing GPU utilization, and contributing to hardware/software co-design. The engineer will also maintain and innovate on the codebase and build tools to enhance team productivity. This role is based in the Bay Area [San Francisco and Palo Alto].Responsibilities:
  • Design, build, and implement large-scale distributed training systems.
  • Profile, debug, and optimize multi-host GPU utilization.
  • Participate in Hardware / Software / Algorithm co-design.
  • Maintain and innovate on the codebase.
  • Build tools to boost the productivity of the team.
Requirements:
  • Experience in configuring and troubleshooting operating systems for maximum performance.
  • Experience building scalable training frameworks for AI models in HPC clusters.
  • Familiarity with scalable orchestration frameworks and tools.
  • Knowledge of machine learning compilers and runtimes such as XLA, MLIR, and Triton.
  • Understanding of distributed training strategies such as FSDP, Megatron, and pipeline parallelism.
  • Experience with NCCL or custom communication libraries.
xAI offers:
  • Opportunity to work on cutting-edge AI systems.
  • A flat organizational structure with hands-on contributions expected.
  • A collaborative environment that values curiosity and engineering excellence.
Apply Manually

xAI

xAI is an artificial intelligence company focused on building AI systems that deeply understand the universe and assist humanity in its quest for knowledge. It operates with a flat organizational structure that values engineering excellence, curiosity, and strong communication. xAI fosters a collaborative environment where every team member contributes directly to the company’s objectives, with a focus on continuous improvement.

All Jobs at xAI (129)