Job Description
xAI is seeking a Software Engineer to join their Network Software and Services for AI (nssAI) team. This team is responsible for building software, services, and frameworks used by xAI's Network Development Engineers to design, deploy, operate, and monitor the company's network infrastructure. The role involves working on all aspects of network management, including metric collection, configuration management & deployment, zero-touch provisioning, network monitoring, alarming & auto-remediation. The Software Engineer will focus on building software and tools with extensive metrics coverage for GPU supercomputing network fabrics used for AI training and serving customer inference queries. The candidate will implement IaC best practices, enhancing deployment pipelines, and ensuring robust, secure service delivery across the production environments.
Responsibilities: - Collaborating with network engineers daily.
- Designing scalable and reliable software for orchestrating network devices.
- Creating metrics to help prioritize focus.
Requirements: - Deep experience collaborating with network engineers.
- Expert knowledge of network topologies and protocols.
- Proven history with designing scalable and reliable software.
- Ability to thrive in ambiguity.
- Proficiency in Python and Go.
- Knowledge of TCP/IP, BGP, and RDMA.
xAI offers: - Opportunity to work on some of the world’s largest GPU supercomputing network fabrics.
- A flat organizational structure.
- A challenging and rewarding work environment.