Job Description
GoFundMe is seeking a Site Reliability Engineer II to join their team in Buenos Aires, Argentina. This role is crucial for ensuring the reliability, scalability, and performance of GoFundMe's platform and applications. The Site Reliability Engineer will work closely with development teams, operations teams, and technology vendors to maintain high availability and optimize system performance. This position requires in-office presence 2-3 times a week.
Responsibilities: - Design and build cloud infrastructure on AWS.
- Participate in performance analysis, tuning, and capacity planning.
- Manage the availability, scalability, security, and performance of the platform and applications.
- Diagnose bottlenecks and provide recommendations for resolution.
- Assess monitoring requirements and implement enhancements.
- Review and implement changes to the live infrastructure.
- Improve SLO/SLI framework.
- Use data analysis to identify trends and potential problems.
- Perform 24/7 on-call duties.
Requirements: - 3+ years of experience in operating high-traffic SaaS environments.
- Expertise in delivering high availability.
- Skills to build a fully automated cloud orchestration framework on AWS.
- Experience running containerized infrastructure in Production (Kubernetes using EKS, AWS ECS).
- Experience implementing configuration management and automation solutions using Infrastructure as Code, CI/CD and GitOps (Ansible, Terraform, ArgoCD, Github Actions).
- Strong working knowledge of Linux.
- Solid scripting skills (e.g. Bash, Python).
- Experience with performance diagnostics, performance tuning, capacity planning, and monitoring.
- BS in Computer Science or equivalent.
- Good verbal and written communication skills.
Benefits: - Competitive pay and comprehensive healthcare benefits.
- Financial assistance for hybrid work and family planning.
- Generous parental leave and flexible time-off policies.
- Mental health and wellness resources.
- Learning, development, and recognition programs.