Browse All Jobs
Job Description
Elastic, the Search AI Company, is seeking a Senior MLOps Engineer to join their Search Inference team. This team focuses on delivering performant and cost-effective machine learning model inference to Search workflows. The ideal candidate will help evolve the inference service to host LLMs, enhance scalability and reliability, and improve cost efficiency. They will also adapt existing solutions to use the inference service, ensuring a seamless transition.

Role involves:
  • Evolving the inference service to host LLMs.
  • Enhancing the scalability and reliability of the service.
  • Improving the cost and efficiency of the platform.
  • Adapting existing solutions to use the inference service.

Requirements:
  • 5+ years working in an MLOps or related ML Engineering role.
  • Production experience self-hosting & operating LLMs at scale for generative tasks via an inference framework such as Ray or KServe (or similar).
  • Production experience with running and tuning specialized hardware for Generative AI workloads, especially GPUs via CUDA.
  • Measured and articulate written and spoken communication skills.
  • An interest in learning new tools, workflows and philosophies.

Elastic offers:
  • Competitive pay.
  • Health coverage for you and your family in many locations.
  • Flexible locations and schedules for many roles.
  • Generous number of vacation days each year.
  • Matching for financial donations and service.
  • Parental leave.
Apply Manually