Browse All Jobs
Job Description
Anthropic is seeking a Research Scientist/Engineer to join their Finetuning Alignment team, focusing on honesty in AI systems. The ideal candidate will spearhead the development of techniques to minimize hallucinations and enhance truthfulness in language models. This role involves creating robust systems that are accurate, reflect true levels of confidence, and avoid being deceptive or misleading. The candidate will ensure models maintain high standards of accuracy and honesty across diverse domains.Responsibilities include:
  • Designing and implementing novel data curation pipelines.
  • Developing specialized classifiers to detect potential hallucinations.
  • Creating and maintaining comprehensive honesty benchmarks.
  • Implementing techniques to ground model outputs in verified information.
  • Designing and deploying human feedback collection.
  • Designing and implementing prompting pipelines to improve model accuracy.
  • Developing and testing novel RL environments.
  • Creating tools to help human evaluators assess model outputs.
Requirements:
  • MS/PhD in Computer Science, ML, or related field.
  • Strong programming skills in Python.
  • Industry experience with language model finetuning and classifier training.
  • Proficiency in experimental design and statistical analysis.
  • Care about AI safety and the accuracy and honesty of AI systems.
  • Experience in data science or dataset curation for finetuning LLMs.
  • Understanding of uncertainty, calibration, and truthfulness metrics.
Anthropic offers:
  • Competitive compensation and benefits.
  • Optional equity donation matching.
  • Generous vacation and parental leave.
  • Flexible working hours.
Apply Manually