Research Scientist, Interpretability

Research Scientist role at Anthropic, focusing on mechanistic interpretability.

Anthropic

USD 315,000 - 560,000

Job Description

Anthropic is seeking a Research Scientist to join their Interpretability team in San Francisco. The team focuses on reverse-engineering how trained language models work to make advanced systems safe. The ideal candidate will have a strong research background and an interest in team science.

The Research Scientist will contribute to understanding LLMs by reverse engineering algorithms learned in their weights, designing and running experiments, creating and analyzing interpretability features and circuits, building infrastructure for experiments, and communicating results.

Role involves: