Job Description
Inworld AI, a leading provider of AI technology, is seeking a Staff Platform Engineer (MLOps) to collaborate with backend and ML Engineering teams. The role involves designing, deploying, and maintaining reliable, high-performance, and secure cloud infrastructure for Inworld AI's AI Engine and Studio. This role is based in Vancouver, British Columbia, Canada.Inworld AI is backed by top-tier investors and powers experiences for companies like Ubisoft and NVIDIA.
What this role involves: - Developing, managing, and optimizing the ML model lifecycle in production.
- Implementing CI/CD systems for ML workflows.
- Monitoring models to identify issues and inefficiencies.
- Designing MLOps tools and frameworks to enhance automation and efficiency.
- Facilitating a "you build it, you run it" culture.
- Managing CI/CD pipelines.
- Identifying and implementing opportunities to enhance engineering speed and efficiency.
- Conducting root cause analysis and developing automated solutions.
- Developing and sharing best practices.
Requirements: - 7 years of experience in software engineering.
- 5 years of experience with infrastructure-as-code.
- Proficiency in managing Kubernetes clusters and applications.
- Experience in creating and maintaining CI/CD pipelines.
- Deep knowledge of at least one major cloud provider (Google Cloud Platform, Microsoft Azure, Oracle Cloud).
- Proficient in at least one backend programming/scripting languages such as Golang, Python, and Bash.
- Familiarity with open source LLM and open source serving solution (e.g. vLLM or llama.cpp, kserve, etc) is a plus.
- Experience with SLURM
- Experience with data pipeline and workflow management tools
- Experience with bare metal GPUs (optional).
What Inworld AI offers: