Our Client is looking for a Senior Site Reliability Engineer to join our team and help build, automate, and secure the infrastructure that powers a cutting-edge cyber range platform. You will work across the full breadth of our SRE practice — spanning traditional site reliability, DevOps, and DevSecOps — supporting deployments across self-hosted data centers, customer-provided hardware, and pre-packaged appliance environments. Collaboration is key, as you’ll partner with engineering teams across the organization, contribute to infrastructure planning, and mentor junior team members. This role balances hands-on delivery with long-term automation thinking, and requires someone who builds well-engineered tooling at scale rather than relying on manual fixes.

Who you are:

You bring strong software engineering skills beyond scripting — you write production-quality code and build maintainable tooling
You think in terms of reliability and operability, always looking to automate and improve
You enjoy helping other engineers solve problems and can context-switch between deep project work and support requests
You’re security-minded, with a practical approach to hardening infrastructure and deployments
You’re comfortable operating in complex, multi-environment setups including on-premises and air-gapped models
You build trust across teams through reliability, follow-through, and clear communication
You’re open to feedback and passionate about sharing knowledge to raise the team’s overall capability

What you’ll be doing:

Designing and building infrastructure automation for consistent, repeatable deployments across SimSpace-hosted, customer-provided, and appliance environments
Developing and maintaining CI/CD pipelines using GitHub Actions and ArgoCD to improve build reliability and developer experience
Managing and evolving Kubernetes-based infrastructure, including application packaging and deployment workflows using Grafana Tanka and Kustomize
Building and maintaining observability tooling using the Grafana stack for monitoring, alerting, logging, and dashboards
Identifying and resolving performance and reliability issues including pod scaling, resource allocation tuning, and latency bottleneck analysis
Hardening deployment pipelines and runtime environments through container security, network segmentation, image scanning, and vulnerability management
Serving as a hands-on infrastructure partner to engineering teams across the organization
Contributing to incident response through a light on-call rotation and driving post-incident improvements
Mentoring junior and mid-level SRE team members

Languages and Tools we use:

GitHub Actions, ArgoCD, Kubernetes, Grafana Stack, Grafana Tanka, Kustomize, VMware

Requirements:

5–7 years of experience in site reliability, DevOps, or infrastructure engineering
Hands-on experience with Kubernetes in production, including deployment tooling, cluster operations, and performance tuning
Solid experience building and maintaining CI/CD pipelines, preferably with GitHub Actions, ArgoCD, or similar GitOps tooling
Practical understanding of infrastructure-as-code principles and configuration management
Experience with observability and monitoring tools, preferably the Grafana stack
Security-minded approach to infrastructure, including container and network security
Working knowledge of VMware virtualization environments
Strong written and verbal communication skills

Nice to have:

Experience delivering software to customer-managed, on-premises, or air-gapped environments
Familiarity with compliance-driven or security-hardened deployment environments
Experience with vulnerability scanning tools and security automation
Background in cybersecurity or networking contexts

Senior Site Reliability Engineer

Apply Here

Quick Links

CORPORATE OFFICE

	CL 98 #22 – 64 Office 617 Bogotá, Colombia
	+57 (601) 4673388 +57 (601) 4672296