Browse All Jobs
Job Description
Moniepoint, a fast-growing financial services platform in Africa, is seeking a Site Reliability Engineer to ensure the smooth and efficient operation of its systems. The role involves a mix of real-time on-call responsibilities and strategic engineering work to improve system resilience and scalability. The Site Reliability Engineer will be responsible for the stability, integrity, and operation of our production applications by supporting, monitoring and driving optimizations while also providing root cause analysis with recommendations for improvements.

Responsibilities:
  • Participate in on-call rotations, triaging and resolving service issues.
  • Act as Incident Commander during major incidents, coordinating teams and providing updates.
  • Investigate and resolve escalated customer complaints related to performance and reliability.
  • Participate in feature development discussions to ensure observability.
  • Create and maintain monitoring dashboards and alerts.
  • Define and track Service Level Indicators (SLIs) and Service Level Objectives (SLOs).
Requirements:
  • Minimum 3 years of experience in an SRE or similar role.
  • Strong knowledge of cloud infrastructure, Kubernetes, and container orchestration tools.
  • Experience with APM and observability platforms (e.g., New Relic, Datadog, ELK, Signoz).
  • Proficiency in setting up and maintaining monitoring dashboards using Grafana and Prometheus.
  • Skilled in diagnosing issues using stack traces, log files, and APIs.
  • Proficiency in SQL databases (e.g., MySQL) and hands-on experience in database administration.
What Moniepoint Offers:
  • Culture that values people and inclusivity.
  • Learning and development-focused environment.
  • Attractive salary, pension, health insurance, and annual bonus.
Apply Manually