Wellhub is seeking a Staff Site Reliability Engineer to join its Platform team in Portugal. This role involves building a global, secure, recoverable, and cost-efficient infrastructure that enables engineering teams to scale Wellhub's product autonomously.
The Staff Site Reliability Engineer will contribute to the development of tools and automation to streamline operational processes and create a reliable, real-time logistics platform. The company aims for a completely autonomous, resilient, and reliable technology stack.
Responsibilities:
- Help build a global, secure, scalable, and cost-effective cloud platform using Kubernetes in AWS
- Develop and evolve Kubernetes operators and other cloud-native automation in Kubernetes.
- Build products and tools enabling engineering teams to create and maintain their cloud resources autonomously.
- Help to ensure security and compliance by delivering secure products and implementing DevSecOps integrations.
- Improve observability, reliability, and cost awareness.
- Support engineering teams in the products and tools usage.
- Build and maintain a modern CI/CD set of tools and services.
- Keep all the Kubernetes clusters highly available and reliable.
- Contribute to our product documentation (e.g. user guide, configurations, operations, and troubleshooting procedures)
- Participate in the definition of standards, RFCs (Request for Comments), guidelines and best practices.
- Inspire and empower others by genuinely caring for your own well-being and your colleagues.
- Bring wellbeing to the forefront of work, and create a supportive environment where everyone feels comfortable taking care of themselves, taking time off, and finding work-life balance.
Requirements:
- Proven technical experience with AWS cloud services, Kubernetes, and software engineering.
- Deep knowledge of Kubernetes and its ecosystem.
- Solid knowledge of observability systems.
- Experience with operator-managed Infrastructure as Code, preferably Crossplane or Kubernetes Operators.
- Ability to write software for production environments.
- Excellent analytical and problem-solving skills, and proven experience in identifying solutions for complex problems.
- Collaboration and learning-driven mindset.
- Excellent communication skills in both English and Portuguese, both verbally and in writing.
The Company Offers:
- Access to digital fitness programs and resources.
- Additional fitness subsidy to access onsite gyms and fitness studios.
- Flexible work options (hybrid and full remote).
- Home office stipend and monthly flexible work allowance.
- Minimum of 25 days paid holiday per year.
- Paid parental leave.
- Opportunities for personal and career growth.