Job Description
Anthropic is seeking a Staff Engineer, Data Infra to join their team in San Francisco. Anthropic is dedicated to developing safe, ethical, and powerful artificial intelligence. The ideal candidate will have strong software engineering skills and experience in building distributed systems.
Responsibilities:
- Design and implement high-performance data processing infrastructure for large language model training
- Develop and maintain core processing primitives (e.g., tokenization, deduplication, chunking) with a focus on scalability
- Build robust systems for data quality assurance and validation at scale
- Implement comprehensive monitoring systems for data processing infrastructure
- Create and optimize distributed computing systems for processing web-scale datasets
- Collaborate with research teams to implement novel data processing architectures
- Build and maintain documentation for infrastructure components and systems
- Design and implement systems for reproducibility and traceability in data preparation
Qualifications:
- 5+ YOE outside of internships
- Strong software engineering skills with experience in building distributed systems
- Expertise in Python and hands-on experience with distributed computing frameworks, particularly Apache Spark
- Deep understanding of cloud computing platforms and distributed systems architecture
- Experience with high-throughput, fault-tolerant system design
- Strong background in performance optimization and system scaling
- Excellent problem-solving skills and attention to detail
- Strong communication skills and ability to work in a collaborative environment
The role offers:
- Competitive compensation and benefits
- Optional equity donation matching
- Generous vacation and parental leave
- Flexible working hours