Job Description
IONOS is seeking a Site Reliability Engineer to join their team in Berlin. This role involves collaborating with various teams within the leading European provider of cloud infrastructure, cloud services, and hosting services. IONOS emphasizes open structures, a friendly working culture, and flat hierarchies with a strong team spirit, believing that work and fun are compatible.
Responsibilities:
- Monitor system performance (uptime, latency, error rates) and lead 24/7 incident response.
- Plan and execute software/hardware deployments across multiple datacenters.
- Conduct regular disaster recovery drills and improve runbooks, alerts, and monitoring thresholds.
- Research, evaluate, and recommend solutions for improving reliability, availability, performance, and security.
- Automate repetitive tasks to improve efficiency.
- Provide level 2 support and direct customer contact.
Requirements:
- Proficient in Linux system administration with strong troubleshooting skills.
- Experienced with virtualized environments (Qemu/KVM, OpenStack, Proxmox, Kubernetes).
- Experience with configuration management tools (SaltStack or Ansible) and monitoring tools (Prometheus, Loki, Grafana).
- Experience with code management (merge conflicts, feature branches, merge requests, CI/CD).
- English and German B2+ proficiency.
The role offers:
- Hybrid working model with home office option.
- Flexible working hours through trust-based working hours.
- Modern office space with very good transport connections.
- Numerous training and development opportunities.
- Various health offers, such as sports and health courses.