xAI, dedicated to creating AI systems that understand the universe, seeks an AI/HPC Network Development Engineer. The engineer will optimize network performance and availability at hyperscale. xAI was the first to build a 100k GPU cluster on an ethernet network and seeks to repeat the process. The ideal candidate will have deep experience in RoCEv2 and can develop at a hyper scale.
The role involves working within NCCL, building metric dashboards, and tweaking configurations. The engineer will also help design the next iteration of xAI's backend and front-end networks. There will be significant travel to Memphis for building more capacity.
Responsibilities:
Requirements:
xAI is an artificial intelligence company focused on building AI systems that deeply understand the universe and assist humanity in its quest for knowledge. It operates with a flat organizational structure that values engineering excellence, curiosity, and strong communication. xAI fosters a collaborative environment where every team member contributes directly to the company’s objectives, with a focus on continuous improvement.
All Jobs at xAI (129)