Browse All Jobs
Job Description
Arcesium is seeking a highly skilled Site Reliability Engineer to join its Technology team. The candidate will work within a cross-functional product team, creating solutions for intricate business challenges.Role involves:
  • Deploying, maintaining, and running a highly-available, multi-tenant distributed system.
  • Automating infrastructure creation and application deployment.
  • Contributing to system design and architecture.
  • Programming in the core application, including instrumenting code with monitoring metrics, setting up traces, and managing logs.
  • Ensuring optimal system performance.
Requirements:
  • At least 6 years of experience in a SRE/Operations/DevOps role running distributed systems in production.
  • Experience with automated provisioning and management of AWS infrastructure and services.
  • Strong knowledge of Linux systems internals and administration.
  • Deep experience with Kubernetes and Docker.
  • Experience automating the software dev/test/deployment lifecycle with continuous integration and continuous deployment.
  • Experience with scaling, monitoring, and troubleshooting actively running systems.
  • Ability to program in Java, C++, or C#.
  • Comfortable with configuration management tools: Ansible, Chef, Puppet, etc.
  • Familiarity with technologies like Fluentd, Key-Val datastores, API management/service meshes, Git, and Key management.
Apply Manually