Launchpad, a people-first technology company, is seeking a Senior CloudOps Engineer to ensure the reliability, security, and performance of its cloud-based infrastructure. The role involves Cloud Operations, Monitoring, Incident Response, and System Resilience, as well as leading automation efforts and driving DevSecOps best practices. This individual will collaborate with development, security, and platform teams to optimize cloud environments, manage incidents, and enhance system availability and operational efficiency.
Responsibilities:
- Lead cloud operations efforts, ensuring high availability, security, and performance of cloud environments.
- Provide Level 3 support for cloud infrastructure, troubleshooting escalated issues and conducting root cause analysis (RCA) for incidents.
- Monitor and optimize cloud environments using tools like Datadog, Prometheus, and Azure Monitor to proactively detect and resolve issues.
- Automate operational tasks using scripting languages.
- Implement and manage disaster recovery and backup solutions.
- Perform security vulnerability remediation and maintain compliance with security best practices.
- Manage patching, updates, and system maintenance.
- Collaborate with development teams to improve cloud reliability, scalability, and performance.
- Define and refine incident response processes.
- Enhance CI/CD pipelines and cloud automation strategies.
- Stay ahead of emerging cloud trends and recommend new technologies.
Requirements:
- 5+ years of experience in CloudOps or Cloud Engineering, with strong expertise in Azure.
- Extensive experience in monitoring, observability, and incident response, using tools like Datadog, Prometheus, and Azure Monitor.
- Strong troubleshooting and problem-solving skills, with hands-on experience in root cause analysis (RCA) and incident management.
- Knowledge of security best practices, vulnerability management, and remediation in cloud environments.
- Expertise in automation and scripting using Python, Bash, or PowerShell.
- Experience with backup and disaster recovery strategies in cloud operations.
- Hands-on experience with Infrastructure-as-Code (IaC) tools such as Terraform and Ansible.
- Strong networking and security knowledge.
- Familiarity with Kubernetes, Docker, and CI/CD pipelines.
- Strong project management skills.
- Excellent communication skills.
Launchpad offers:
- 100% remote work
- Excellent compensation in US Dollars
- Hardware setup for working from home
- Work with global teams
- Training allowances
- Personal time off (PTO)