Browse All Jobs
Job Description
Altana is seeking a Senior Manager, Technical Operations & Observability to lead teams ensuring the operational health and efficiency of its technical infrastructure. This role encompasses Observability, Site Reliability Engineering (SRE), Incident Management, and internal IT Operations. The Senior Manager will collaborate with Cloud Engineering, Developer Experience, and Information Security teams, focusing on production infrastructure cost and utilization. They will leverage data from Observability and FinOps to enhance resilience and cost-efficiency.Responsibilities include defining and implementing monitoring strategies, championing SRE principles, refining incident response processes, overseeing IT infrastructure, and collaborating on FinOps strategies.Responsibilities:
  • Define and execute the technical strategy for the team.
  • Work closely with peer engineering teams to influence infrastructure cost optimization strategies.
  • Collaborate with engineering teams to ensure new services are designed and built with operability, reliability, and cost-efficiency in mind.
  • Stay current with industry trends and best practices.
  • Lead, mentor, and develop high-performing teams.
  • Champion a culture of proactive monitoring, operational readiness, and continuous learning.
  • Manage on-call rotations and ensure effective response procedures are in place.
  • Oversee the management and continuous improvement of our internal IT infrastructure.
  • Lead the incident management process for production issues.
  • Drive automation efforts across operational tasks.
Requirements:
  • Experience building, leading, and developing technical operations, SRE, or IT teams.
  • Experience in implementing and managing observability platforms and practices.
  • Strong understanding of Site Reliability Engineering principles and practices.
  • Experience with IT operations, including managing internal infrastructure and services.
  • Familiarity with FinOps principles and experience leveraging cost data.
  • Experience in leading and improving incident management processes.
  • Proficiency in cloud platforms such as AWS, Azure, or GCP.
  • Strong understanding of monitoring and alerting tools (e.g., Prometheus, Grafana, Datadog).
  • Excellent problem-solving, communication, and leadership skills.
  • Ability to work effectively in a fast-paced, dynamic environment.
Altana offers:
  • Flexible Time Off
  • Paid Parental Leave
  • Health Benefits
  • Supplemental Benefits
  • 401(k) Savings
  • Commuter Benefits
  • Wellness programs
  • Pet Insurance
  • Employee Assistance Program
  • Dependent Care FSA
Apply Manually