Job Description
Diligent is seeking a Staff Site Reliability Engineer to design, build, and optimize its AWS-based infrastructure and developer platform. The ideal candidate will have a strong software engineering background and a DevOps mindset.
The Staff Site Reliability Engineer will be responsible for:
- Architecting, developing, and maintaining scalable cloud infrastructure using AWS CDK, CDKTF (Terraform CDK), and Terraform.
- Building self-service tools and automation pipelines to enhance developer productivity and system reliability.
- Driving observability, cost optimization, and security best practices across cloud environments.
- Collaborating with engineering and SRE teams to improve system performance, maintainability, and scalability.
- Leading and mentoring engineers, advocating for software engineering best practices within the SRE team.
- Ensuring high availability and fault tolerance of cloud services through proactive monitoring and automation.
The ideal candidate should have:
- 7-10 years of professional experience in software engineering, DevOps, or site reliability engineering.
- Expert-level experience with AWS, including services like EC2, Lambda, ECS, Fargate, S3, IAM, VPC, Route 53, RDS, DynamoDB, and CloudWatch.
- Strong Object-Oriented Programming (OOP) skills, with experience in TypeScript and Python.
- Deep experience with Infrastructure as Code (IaC) using AWS CDK, CDKTF (Terraform CDK), or Terraform.
- Proficiency in designing, implementing, and testing scalable software architectures.
- Experience with CI/CD pipelines (GitHub Actions, CodePipeline, or similar).
- Strong automation and scripting abilities for operational workflows and cloud infrastructure.
- Understanding of cloud security best practices, IAM policies, and compliance frameworks.
- Excellent problem-solving skills, technical leadership, and ability to mentor other engineers.
Diligent offers:
- A flexible work environment
- Comprehensive health benefits
- Generous time off policy
- Wellness programs