Anthropic is seeking a Research Engineer to join their Interpretability team in San Francisco. The team focuses on reverse-engineering how trained models work to make advanced systems safe through mechanistic understanding. The role involves implementing and analyzing research experiments, optimizing research workflows, and building tools to support rapid experimentation and improve model safety.
The Research Engineer will collaborate with teams across Anthropic, such as Alignment Science and Societal Impacts, to use interpretability work to improve model safety. They will also contribute to the Interpretability Architectures project, collaborating with Pretraining.
Responsibilities include:
Requirements:
Anthropic offers: