High-Performance Networking Engineer - Supercomputing

High-Performance Networking Engineer to design and optimize low-latency networking solutions.

Job Description

xAI is seeking a High-Performance Networking Engineer to join its Supercomputing team. In this role, the individual will be responsible for designing and optimizing low-latency, high-bandwidth networking solutions using NVIDIA’s RDMA-capable technologies to support some of the world’s largest GPU supercomputing clusters. These clusters drive AI training and inference workloads, demanding cutting-edge performance and scalability. xAI's mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge.Role involves:

Developing and tuning RDMA-based communication systems leveraging NVIDIA GPUs and Mellanox NICs (InfiniBand, RoCE) for ultra-fast data transfer between nodes.
Implementing and optimizing GPUDirect RDMA to enable direct memory access between GPUs and network interfaces, minimizing CPU overhead.
Integrating RDMA solutions with Kubernetes-based workloads, ensuring seamless operation across distributed compute and storage systems.
Collaborating with AI researchers and infrastructure teams to accelerate data pipelines and collective communications using NCCL and MPI.
Troubleshooting and resolving performance bottlenecks in high-throughput, low-latency networking environments.

Requirements:

Hands-on experience with NVIDIA RDMA technologies (e.g., GPUDirect RDMA, RoCE, InfiniBand) in HPC or AI supercomputing environments.
Proficiency in programming with Rust, C, or C++ for low-level networking and system optimization.
Familiarity with NVIDIA’s networking stack, including Mellanox drivers, libraries (e.g., libibverbs), and tools (e.g., NVPeerMemory).
Experience optimizing distributed systems with MPI, NCCL, or similar frameworks for GPU-accelerated workloads.
Knowledge of Kubernetes networking and integrating RDMA into containerized environments.

Role offers:

Opportunity to work on cutting-edge networking solutions for AI supercomputing.
Collaboration with a highly motivated and focused team.

Apply Manually

xAI

xAI is an artificial intelligence company focused on building AI systems that deeply understand the universe and assist humanity in its quest for knowledge. It operates with a flat organizational structure that values engineering excellence, curiosity, and strong communication. xAI fosters a collaborative environment where every team member contributes directly to the company’s objectives, with a focus on continuous improvement.

All Jobs at xAI (129)

Clash

of Jobs

High-Performance Networking Engineer - Supercomputing

Job Description

xAI

This feature is not ready yet

Sign up for the newsletter to get notified when it's available

High-Performance Networking Engineer - Supercomputing

Job Description

xAI