Browse All Jobs
Job Description

Scale is seeking a Machine Learning Infrastructure Engineer to join their Machine Learning Infrastructure team and contribute to the development of their Training Platform. The successful candidate will collaborate closely with Machine Learning researchers to understand their needs and leverage their expertise and compute resources to expedite experimentation. This role requires a strong foundation in machine learning, backend system design, and prior experience in ML Infrastructure.

The ML Infrastructure Engineer will be responsible for:

  • Building highly available, observable, performant, and cost-effective APIs for model training.
  • Participating in the team’s on-call process to ensure service availability.
  • Owning projects end-to-end, from requirements and scoping to design and implementation, in a collaborative environment.
  • Making informed decisions regarding build vs. buy tradeoffs, with a focus on cost efficiency.

The ideal candidate should possess the following qualifications:

  • 4+ years of experience building machine learning training pipelines or inference services in a production setting.
  • Experience with distributed training techniques such as DeepSpeed, FSDP, etc.
  • Experience building, deploying, and monitoring complex microservice architectures.
  • Experience with Python, Docker, Kubernetes, and Infrastructure as code (e.g., Terraform).

Preferred qualifications include:

  • Experience with LLM inference latency optimization techniques (e.g., kernel fusion, quantization, dynamic batching).
  • Experience working with a cloud technology stack (e.g., AWS or GCP).

Scale offers:

  • Comprehensive health, dental, and vision coverage.
  • Retirement benefits.
  • A learning and development stipend.
  • Generous PTO.
Apply Manually

Scale AI

Scale AI accelerates the development of AI applications across industries. The company's products power advanced language models, generative models, and computer vision models. Scale AI serves generative AI companies, government agencies, and enterprises, assisting organizations in building and deploying AI. Committed to inclusivity and equal opportunity, Scale AI fosters professional growth, offering opportunities to contribute to cutting-edge AI projects and collaborate with experts in the field.

All Jobs at Scale AI (200)