Job Description
Perplexity, a conversational AI company, is seeking an AI Inference Engineer to join their expanding team. The ideal candidate will contribute to the large-scale deployment of machine learning models for real-time inference. This role offers the opportunity to work with cutting-edge technologies and optimize AI inference performance.
Responsibilities: - Develop APIs for AI inference.
- Benchmark and address bottlenecks in the inference stack.
- Improve system reliability and observability.
- Implement LLM inference optimizations.
Qualifications: - Experience with ML systems and deep learning frameworks (e.g., PyTorch, TensorFlow, ONNX).
- Familiarity with LLM architectures and inference optimization techniques.
- Understanding of GPU architectures or experience with CUDA.
Perplexity offers: - Comprehensive health, dental, and vision insurance.
- 401(k) plan.
- Equity may be part of the total compensation package.