Browse All Jobs
Job Description
Together AI is seeking a Senior Software Engineer to join their team and contribute to the development of the AI Acceleration Cloud platform. This platform is designed to be an end-to-end solution for the generative AI lifecycle, combining a fast LLM inference engine with advanced AI cloud infrastructure.He will be responsible for building a highly available, global, and fast cloud infrastructure that virtualizes ML hardware and provides self-serve AI cloud services to ML practitioners.Responsibilities:
  • Perform architecture and research work for decentralized AI workloads.
  • Work on the core, open-source Together AI platform.
  • Create services, tools, and developer documentation.
  • Create testing frameworks for robustness and fault-tolerance.
Requirements:
  • 5+ years of professional software development experience and proficiency in at least one backend programming language (Golang desired).
  • 5+ years experience writing high-performance, well-tested, production quality code.
  • Demonstrated experience with building and operating high-performance and/or globally distributed micro-service architectures across one or more cloud providers (AWS, Azure, GCP).
  • Excellent communication skills – able to write clear design docs and work effectively with both technical and non-technical team members.
  • Deep experience with Kubernetes internals a big plus.
  • Deep experience with VMs/hypervisors a big plus.
  • Deep experience with DC networking tech + solutions a big plus.
  • Experience with Cluster API or similar a big plus.
  • Experience working on high-performance compute, networking, and/or storage a big plus.
  • Experience virtualizing GPUs and/or Infiniband a big plus.
  • Strong systems knowledge across compute, networking, and storage.
  • Experience with infrastructure automation tools, monitoring/observability stacks, and CI/CD pipelines.
  • Experience building IaaS or PaaS systems at scale a plus.
  • Experience with DPUs/SmartNICs a plus.
  • GPU programming, NCCL, CUDA knowledge a plus.
The role offers:
  • Competitive compensation.
  • Startup equity.
  • Health insurance.
  • Flexibility in terms of remote work.
Apply Manually