Browse All Jobs
Job Description

Anthropic is seeking a Research Engineer to join their Interpretability team in San Francisco. The team focuses on reverse-engineering how trained models work to make advanced systems safe through mechanistic understanding. The role involves implementing and analyzing research experiments, optimizing research workflows, and building tools to support rapid experimentation and improve model safety.

The Research Engineer will collaborate with teams across Anthropic, such as Alignment Science and Societal Impacts, to use interpretability work to improve model safety. They will also contribute to the Interpretability Architectures project, collaborating with Pretraining.

Responsibilities include:

  • Implementing and analyzing research experiments
  • Setting up and optimizing research workflows
  • Building tools and abstractions to support research
  • Developing and improving tools and infrastructure

Requirements:

  • 5-10+ years of experience building software
  • Proficiency in at least one programming language (e.g., Python, Rust, Go, Java)
  • Experience contributing to empirical AI research projects
  • Strong ability to prioritize and direct effort

Anthropic offers:

  • Competitive compensation and benefits
  • Optional equity donation matching
  • Generous vacation and parental leave
  • Flexible working hours
Apply Manually