Job Description
CoreWeave is seeking an Operations Engineer to join their Fleet Reliability Operations team. This role is crucial for maintaining the uptime and provisioning of CoreWeave's expanding fleet of server nodes. The engineer will be at the forefront of configuring, updating, and troubleshooting high-tier supercomputing clusters and their dependencies.Role involves:
- Configuring and maintaining large-scale, high-performance supercomputing clusters.
- Troubleshooting hardware and software issues.
- Monitoring and analyzing system performance.
- Creating and maintaining documentation.
- Participating in on-call rotations.
Requirements:
- 2+ years of experience in troubleshooting or administering data center or on-prem infrastructure.
- Strong understanding of Linux system administration and networking concepts.
- Ability to troubleshoot hardware and software issues.
- Bachelor’s degree in a related field or equivalent experience.
What CoreWeave offers:
- Family-level Medical Insurance
- Family-level Dental Insurance
- Generous Pension Contribution
- Life Assurance at 4x Salary
- Critical Illness Cover
- Employee Assistance Programme
- Tuition Reimbursement
- Work culture focused on innovative disruption