[Remote] Senior Site Reliability Engineer

Remote, USA Full-time Posted 2026-06-16

Note: The job is a remote job and is open to candidates in USA. Ellucian is a company that powers innovation for higher education, serving over 21 million students globally. They are seeking a Senior Site Reliability Engineer to ensure the reliability, performance, and cost-efficiency of their production systems, focusing on DevOps practices and incident management.

Responsibilities

Own and improve system reliability, availability, and performance for production environments
Design, implement, and manage monitoring, alerting, and observability using DataDog (required)
Lead incident response efforts, including troubleshooting, mitigation, and post-incident reviews
Perform detailed root cause analysis (RCA) and drive permanent resolutions
Partner with engineering and DevOps teams to build scalable, resilient infrastructure
Automate operational processes to improve efficiency and reduce risk
Analyze and optimize infrastructure and application costs
Define and manage SLIs/SLOs to meet reliability targets
Continuously improve deployment, monitoring, and operational practices

Skills

5+ years of experience in Site Reliability Engineering, DevOps, or similar roles
Strong, hands-on expertise with DataDog (APM, logs, metrics, dashboards, alerting)
Experience with cloud platforms (AWS, Azure, or GCP)
Proficiency in DevOps practices and tools (CI/CD, Infrastructure as Code such as Terraform)
Strong troubleshooting skills and experience conducting root cause analysis in distributed systems
Experience with containers and orchestration (Docker, Kubernetes)
Scripting or programming experience (Python, Bash, or similar)
Proven ability to analyze and optimize cloud costs
Own and improve system reliability, availability, and performance for production environments
Design, implement, and manage monitoring, alerting, and observability using DataDog (required)
Lead incident response efforts, including troubleshooting, mitigation, and post-incident reviews
Perform detailed root cause analysis (RCA) and drive permanent resolutions
Partner with engineering and DevOps teams to build scalable, resilient infrastructure
Automate operational processes to improve efficiency and reduce risk
Analyze and optimize infrastructure and application costs
Define and manage SLIs/SLOs to meet reliability targets
Continuously improve deployment, monitoring, and operational practices
Experience with cost management tools (e.g., AWS Cost Explorer, Azure Cost Management)
Familiarity with cloud security and compliance best practices
Experience supporting high-availability, customer-facing systems
Strong collaboration and communication skills

Benefits

Comprehensive health coverage: medical, dental, and vision
Flexible time off
Thrive Flex Lifestyle Account (LSA) that allows you to contribute towards your health, financial or learning interests
401k w/ match & BrightPlan - to help you save for the future
Parental Leave
5 charitable days to support the community that supports us
Telemedicine
Wellness
Headspace Care (mental health)
Wellbeats (virtual fitness classes)
RethinkCare & Wellthy– caregiver support
Diversity and inclusion programs which provide access to internal employee resource groups
Employee referral bonuses to encourage the addition of great new people to the team
We Foster a learning culture with:
Education Assistance Program
Professional development opportunities

Company Overview

Ellucian delivers the software, services, and insights that help your institution thrive. It was founded in 1968, and is headquartered in Fairfax, Virginia, USA, with a workforce of 1001-5000 employees. Its website is http://www.ellucian.com.

Company H1B Sponsorship

Ellucian has a track record of offering H1B sponsorships, with 2 in 2026, 31 in 2025, 27 in 2024, 28 in 2023, 31 in 2022, 33 in 2021, 30 in 2020. Please note that this does not guarantee sponsorship for this specific role.

Apply To This Job

Apply Now

[Remote] Senior Site Reliability Engineer

Similar Jobs

[Remote] Labor Analyst (Financial Analyst 1) 28845

[Remote] Full Stack Developer (Remote)

[Remote] UX/UI Designer - ChessKid

[Remote] Paid Media Analyst

[Remote] AI Training Engineer | $74/hr Remote

[Remote] Technical Recruiter

[Remote] Business Development Manager- Healthcare, Data Centers, Adjacent Markets

[Remote] Senior Operations & Supply Chain Consultant

[Remote] Water Purifier Trainer

[Remote] Sr. Cloud Reliability Engineer

Head Start Early Childhood Specialist Manager - Region VI (REMOTE)

Experienced Part-Time Remote Data Entry Clerk – Typing – Entry-Level Opportunity at arenaflex

Swedish Audio Evaluator – Remote

Bilingual (Spanish) Advanced Practice Provider (NP or PA): Cardiology

Experienced Customer Service Representative – Remote Work Opportunity at arenaflex

Remote Customer Service Representative – Flexible Schedule, Competitive Pay, Career Growth Opportunities at arenaflex

Online Part Time Faculty, Healthcare Administration

Regional Real Estate Director - Western U.S.

Experienced Customer Service Chat Representative – Home-Based Part-Time Position with Competitive Hourly Pay

[Remote] Junior Customer Success Manager m/f/x - Elternzeitvertretung (Remote)