Scale is seeking a Machine Learning Research Scientist/Research Engineer to focus on LLM Evaluation. This role involves advancing the evaluation and benchmarking of large language models (LLMs) and contributing to industry-leading LLM leaderboards. The ideal candidate will develop rigorous, scalable, and fair evaluation methodologies to drive the next generation of AI capabilities. They will collaborate with internal teams and external partners to refine metrics and create standardized evaluation protocols. Scale encourages collaborations within the industry and academia, and supports the publication of research findings.
The role involves:
Designing and developing novel evaluation benchmarks for large language models.
Conducting research on the effectiveness and limitations of existing LLM evaluation techniques.
Collaborating with internal teams and external partners to refine metrics and create standardized evaluation protocols.
Implementing scalable and reproducible evaluation pipelines using modern ML frameworks.
Publishing research findings in top-tier AI conferences and contribute to open-source benchmarking initiatives.
Requirements:
Ph.D. or Master's degree in Computer Science, Machine Learning, AI, or a related field.
Strong background in deep learning and LLMs, with experience in model evaluation.
Familiarity with benchmarking tools and datasets for LLM evaluation.
Hands-on experience large-scale model training and deployment.
Excellent written and verbal communication skills.
Published research in areas of machine learning at major conferences and/or journals.
Scale AI accelerates the development of AI applications across industries. The company's products power advanced language models, generative models, and computer vision models. Scale AI serves generative AI companies, government agencies, and enterprises, assisting organizations in building and deploying AI. Committed to inclusivity and equal opportunity, Scale AI fosters professional growth, offering opportunities to contribute to cutting-edge AI projects and collaborate with experts in the field.