Job Description
Perplexity is seeking an AI Inference Engineer to join their growing team in London. The ideal candidate will have experience with ML systems, deep learning frameworks, and deploying real-time model serving at scale. This role offers the opportunity to work on large-scale deployment of machine learning models for real-time inference.
Responsibilities:
- Develop APIs for AI inference for internal and external customers.
- Benchmark and address bottlenecks throughout the inference stack.
- Improve the reliability and observability of systems.
- Explore novel research and implement LLM inference optimizations.
Qualifications:
- Experience with ML systems and deep learning frameworks (e.g. PyTorch, TensorFlow, ONNX).
- Familiarity with common LLM architectures and inference optimization techniques.
- Experience with deploying reliable, distributed, real-time model serving at scale (Optional).
- Understanding of GPU architectures or experience with GPU kernel programming using CUDA (Optional).
Perplexity offers:
- Comprehensive health, dental, and vision insurance.
- 401(k) plan.
- Equity may be part of the total compensation package.