Job Description
Anthropic is seeking a Research Scientist to join their Interpretability team in San Francisco. The team focuses on reverse-engineering how trained language models work to make advanced systems safe. The ideal candidate will have a strong research background and an interest in team science.
The Research Scientist will contribute to understanding LLMs by reverse engineering algorithms learned in their weights, designing and running experiments, creating and analyzing interpretability features and circuits, building infrastructure for experiments, and communicating results.
Role involves:
- Developing methods for understanding LLMs
- Designing and running experiments
- Creating and analyzing interpretability features and circuits
- Building infrastructure for running experiments and visualizing results
- Communicating results internally and publicly
Requirements:
- Strong track record of scientific research
- Enjoy team science
- Comfortable with messy experimental science
- View research and engineering as two sides of the same coin
- Ability to articulate and discuss the motivations behind work
- Familiarity with Python
- Bachelor's degree in a related field or equivalent experience
Anthropic offers:
- Competitive compensation and benefits
- Optional equity donation matching
- Generous vacation and parental leave
- Flexible working hours