Job Description
Together AI is seeking a Senior Backend Engineer to contribute to their AI Acceleration Cloud, an end-to-end platform designed for the generative AI lifecycle. The successful candidate will be instrumental in developing a highly available, global, and fast cloud infrastructure that virtualizes cutting-edge ML hardware and empowers ML practitioners with self-serve AI cloud services.Together AI is looking for someone with strong software development skills, systems knowledge, and excellent communication abilities.
Responsibilities: - Perform architecture and research work for decentralized AI workloads
- Work on the core, open-source Together AI platform
- Create services, tools, and developer documentation
- Create testing frameworks for robustness and fault-tolerance
Requirements: - 5+ years of professional software development experience and proficiency in at least one backend programming language (Golang desired)
- 5+ years experience writing high-performance, well-tested, production quality code
- Demonstrated experience with building and operating high-performance and/or globally distributed micro-service architectures across one or more cloud providers (AWS, Azure, GCP)
- Excellent communication skills – able to write clear design docs and work effectively with both technical and non-technical team members
- Deep experience with Kubernetes internals a big plus
- Deep experience with VMs/hypervisors a big plus
- Deep experience with DC networking tech + solutions a big plus
- Experience with Cluster API or similar a big plus
- Experience working on high-performance compute, networking, and/or storage a big plus
- Experience virtualizing GPUs and/or Infiniband a big plus
- Strong systems knowledge across compute, networking, and storage
- Experience with infrastructure automation tools, monitoring/observability stacks, and CI/CD pipelines
- Experience building IaaS or PaaS systems at scale a plus
- Experience with DPUs/SmartNICs a plus
- GPU programming, NCCL, CUDA knowledge a plus
What Together AI Offers: - Competitive compensation
- Startup equity
- Health insurance
- Flexibility in terms of remote work