Browse All Jobs
Job Description

xAI is seeking an RDMA Engineer to join its Supercomputing team. The company's mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. The ideal candidate will design and optimize low-latency, high-bandwidth networking solutions using NVIDIA’s RDMA-capable technologies to support some of the world’s largest GPU supercomputing clusters.

The RDMA Engineer will focus on developing and tuning RDMA-based communication systems, implementing and optimizing GPUDirect RDMA, integrating RDMA solutions with Kubernetes-based workloads, collaborating with AI researchers, and troubleshooting performance bottlenecks.

Responsibilities:

  • Develop and tune RDMA-based communication systems leveraging NVIDIA GPUs and Mellanox NICs (InfiniBand, RoCE) for ultra-fast data transfer between nodes.
  • Implement and optimize GPUDirect RDMA to enable direct memory access between GPUs and network interfaces, minimizing CPU overhead.
  • Integrate RDMA solutions with Kubernetes-based workloads, ensuring seamless operation across distributed compute and storage systems.
  • Collaborate with AI researchers and infrastructure teams to accelerate data pipelines and collective communications using NCCL and MPI.
  • Troubleshoot and resolve performance bottlenecks in high-throughput, low-latency networking environments.

Requirements:

  • Hands-on experience with NVIDIA RDMA technologies (e.g., GPUDirect RDMA, RoCE, InfiniBand) in HPC or AI supercomputing environments.
  • Proficiency in programming with Rust, C, or C++ for low-level networking and system optimization.
  • Familiarity with NVIDIA’s networking stack, including Mellanox drivers, libraries (e.g., libibverbs), and tools (e.g., NVPeerMemory).
  • Experience optimizing distributed systems with MPI, NCCL, or similar frameworks for GPU-accelerated workloads.
  • Knowledge of Kubernetes networking and integrating RDMA into containerized environments.

The role offers:

  • Opportunity to work on cutting-edge technologies in AI supercomputing.
  • Collaboration with a highly motivated and skilled team.
Apply Manually

xAI

xAI is an artificial intelligence company focused on building AI systems that deeply understand the universe and assist humanity in its quest for knowledge. It operates with a flat organizational structure that values engineering excellence, curiosity, and strong communication. xAI fosters a collaborative environment where every team member contributes directly to the company’s objectives, with a focus on continuous improvement.

All Jobs at xAI (129)