Job Description
Arcesium is seeking a highly skilled Site Reliability Engineer to join its Technology team. The candidate will work within a cross-functional product team, creating solutions for intricate business challenges.Role involves:
- Deploying, maintaining, and running a highly-available, multi-tenant distributed system.
- Automating infrastructure creation and application deployment.
- Contributing to system design and architecture.
- Programming in the core application, including instrumenting code with monitoring metrics, setting up traces, and managing logs.
- Ensuring optimal system performance.
Requirements:
- At least 6 years of experience in a SRE/Operations/DevOps role running distributed systems in production.
- Experience with automated provisioning and management of AWS infrastructure and services.
- Strong knowledge of Linux systems internals and administration.
- Deep experience with Kubernetes and Docker.
- Experience automating the software dev/test/deployment lifecycle with continuous integration and continuous deployment.
- Experience with scaling, monitoring, and troubleshooting actively running systems.
- Ability to program in Java, C++, or C#.
- Comfortable with configuration management tools: Ansible, Chef, Puppet, etc.
- Familiarity with technologies like Fluentd, Key-Val datastores, API management/service meshes, Git, and Key management.