[Remote] GOV Site Reliability Engineer
Note: The job is a remote job and is open to candidates in USA. Veeam Software is the Data and AI Trust Company, focusing on data resilience and security. They are seeking a Site Reliability Engineer to support their Government and Sovereign Cloud environment, where the role involves incident response, reliability improvements, and collaboration across engineering and compliance teams.
Responsibilities
- Get up to speed on VDC workloads, dependencies, and operational workflows by reading code, docs, and working with SMEs
- Write and maintain runbooks, incident guides, and operational documentation
- Support knowledge transfer and contribute to onboarding materials for the team
- Participate in incident response including triage, investigation, mitigation, and postmortems
- Help implement and maintain SLIs, SLOs, and error budgets defined by the team
- Identify reliability issues during incidents or reviews and propose concrete improvements
- Support high availability and fault tolerance work on Azure, including Azure Government
- Close monitoring gaps by implementing instrumentation, alerting, and dashboards based on team standards
- Contribute to toil reduction through automation and tooling improvements
- Participate in on-call rotations
- Work with IaC, CI/CD pipelines, and deployment tooling in compliance-restricted environments
- Support testing, canary deployments, and release validation workflows
- Implement changes to infrastructure and configuration following established patterns and review processes
- Work with engineering, security, compliance, and operations teams to execute on reliability improvements
- Communicate clearly about system behavior, risk, and status — in writing and in meetings
- Raise blockers and gaps proactively; don't wait for problems to escalate
Skills
- 3+ years in Software Engineering, with at least 1 year in SRE, Platform Engineering, or DevOps working on cloud-hosted services
- Experience with cloud infrastructure on Azure or a comparable cloud provider
- Familiarity with regulated or compliance-oriented environments such as government (FedRAMP, CMMC), financial (PCI-DSS), or healthcare (HIPAA). You understand that compliance shapes what you can and can't do operationally
- Able to read and understand code well enough to investigate system behavior without always having someone walk you through it
- Experience with monitoring and observability tools (e.g., Prometheus, Grafana, OpenTelemetry, ELK stack)
- Experience with IaC tools (Terraform, Terragrunt, or Pulumi) and container orchestration (Kubernetes)
- Experience with CI/CD tooling such as GitHub Actions, Azure DevOps, GitLab CI, or ArgoCD
- Strong programming skills in one or more of: TypeScript/JS, Go, Java, C#, or similar
- Solid understanding of distributed systems fundamentals and networking basics
- Clear written and verbal communication skills
- Experience in Government or Sovereign Cloud environments (e.g., Azure Government, AWS GovCloud)
- Background in SaaS platforms or multi-tenant systems
- Familiarity with chaos engineering, resilience testing, or load testing
- Exposure to building or improving reliability practices on a team
- Familiar with AI-first development workflows using LLM-powered tools for automation, code generation, or documentation
Benefits
- Unlimited paid time off, 12 paid holidays including 4 global VeeaMe Days for self-care and 24 paid volunteer hours annually through Veeam Cares
- Paid parental leave: 8 weeks for all parents, 16 weeks for birthing parents
- Medical, dental, and vision coverage starting on your first day
- Mental health support, therapy sessions, and digital wellness tools via our Employee Assistance Program
- 401(k) retirement plan with company matching contributions
- Fertility, adoption, and surrogacy support through Maven, plus paid volunteer time
- AirVet: 24/7 virtual veterinary care at no cost
- Legal services, identity protection, and supplemental health insurance options
- Tax-advantaged spending accounts for healthcare, dependent care, and commuting
- Opportunities to learn and grow through on-demand libraries (LinkedIn Learning, O’Reilly), mentoring, workshops, and learning events like our annual Global Day of Learning
Company Overview
Company H1B Sponsorship
Apply To This Job