Senior Site Reliability Engineer

Hiring Organization: Confidential

Our Client is looking for a Senior Site Reliability Engineer to join our team and help build, automate, and secure the infrastructure that powers a cutting-edge cyber range platform. You will work across the full breadth of our SRE practice — spanning traditional site reliability, DevOps, and DevSecOps — supporting deployments across self-hosted data centers, customer-provided hardware, and pre-packaged appliance environments. Collaboration is key, as you’ll partner with engineering teams across the organization, contribute to infrastructure planning, and mentor junior team members. This role balances hands-on delivery with long-term automation thinking, and requires someone who builds well-engineered tooling at scale rather than relying on manual fixes.

Who you are:

  • You bring strong software engineering skills beyond scripting — you write production-quality code and build maintainable tooling
  • You think in terms of reliability and operability, always looking to automate and improve
  • You enjoy helping other engineers solve problems and can context-switch between deep project work and support requests
  • You’re security-minded, with a practical approach to hardening infrastructure and deployments
  • You’re comfortable operating in complex, multi-environment setups including on-premises and air-gapped models
  • You build trust across teams through reliability, follow-through, and clear communication
  • You’re open to feedback and passionate about sharing knowledge to raise the team’s overall capability

What you’ll be doing:

  • Designing and building infrastructure automation for consistent, repeatable deployments across SimSpace-hosted, customer-provided, and appliance environments
  • Developing and maintaining CI/CD pipelines using GitHub Actions and ArgoCD to improve build reliability and developer experience
  • Managing and evolving Kubernetes-based infrastructure, including application packaging and deployment workflows using Grafana Tanka and Kustomize
  • Building and maintaining observability tooling using the Grafana stack for monitoring, alerting, logging, and dashboards
  • Identifying and resolving performance and reliability issues including pod scaling, resource allocation tuning, and latency bottleneck analysis
  • Hardening deployment pipelines and runtime environments through container security, network segmentation, image scanning, and vulnerability management
  • Serving as a hands-on infrastructure partner to engineering teams across the organization
  • Contributing to incident response through a light on-call rotation and driving post-incident improvements
  • Mentoring junior and mid-level SRE team members

Languages and Tools we use:

  • GitHub Actions, ArgoCD, Kubernetes, Grafana Stack, Grafana Tanka, Kustomize, VMware

Requirements:

  • 5–7 years of experience in site reliability, DevOps, or infrastructure engineering
  • Hands-on experience with Kubernetes in production, including deployment tooling, cluster operations, and performance tuning
  • Solid experience building and maintaining CI/CD pipelines, preferably with GitHub Actions, ArgoCD, or similar GitOps tooling
  • Practical understanding of infrastructure-as-code principles and configuration management
  • Experience with observability and monitoring tools, preferably the Grafana stack
  • Security-minded approach to infrastructure, including container and network security
  • Working knowledge of VMware virtualization environments
  • Strong written and verbal communication skills

Nice to have:

  • Experience delivering software to customer-managed, on-premises, or air-gapped environments
  • Familiarity with compliance-driven or security-hardened deployment environments
  • Experience with vulnerability scanning tools and security automation
  • Background in cybersecurity or networking contexts

Apply Here

This field is for validation purposes and should be left unchanged.
Candidate's Name*
Accepted file types: docx, doc, pdf, Max. file size: 256 MB.