Research Scientist/Engineer, Alignment Finetuning

Research Scientist/Engineer to develop alignment finetuning techniques at Anthropic.

Anthropic

Hybrid

On-Site

United States

USD 280,000 - 425,000

Job Description

Anthropic is seeking a Research Scientist/Engineer to join their Alignment Finetuning team. The successful candidate will be responsible for developing and implementing techniques to train language models that are more aligned with human values. This role involves creating novel finetuning techniques and using them to improve model behavior demonstrably.

The Research Scientist/Engineer will work on synthetic data generation and advanced training pipelines. They will also collaborate across teams to integrate alignment improvements into production models and develop processes to automate and scale the team's work.

Responsibilities:

Develop and implement novel finetuning techniques using synthetic data generation and advanced training pipelines
Use these to train models to have better alignment properties including honesty, character, and harmlessness
Create and maintain evaluation frameworks to measure alignment properties in models
Collaborate across teams to integrate alignment improvements into production models
Develop processes to help automate and scale the work of the team

Requirements: