EOS RPO

Senior Systems Operations Engineer

Posted Apr 2, 2026
Project ID: R-518242
Location
Bangalore, karnatka
Hours/week
40 hrs/week
Application Deadline: Apr 15, 2026 10:26 PM

Required Qualifications:

  • 4+ years of Systems Engineering, Technology Architecture experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education

Desired Qualifications:

  • Strong experience in large-scale distributed systems; 5+ years hands-on SRE/DevOps/Platform Engineering.

  • Cloud: One or more—AWS / Azure / GCP (certifications a plus).

  • IaC & Automation: Terraform, Ansible/Chef; solid Git practices (GitOps 

  • Observability: Prometheus, Grafana, OpenTelemetry, Thousandeyes, Appdynamics, Aternity.

  • CI/CD: Azure DevOps, GitHub Actions, Jenkins, or GitLab CI; artifact mgmt and environment promotions.

  • Programming: One of Python/Go/Java (scripting + API integrations).

  • Reliability Practices: SLIs/SLOs, error budgets, capacity planning, canary/bluegreen, chaos/DR testing.

  • Processes: Incident/Problem/Change, blameless postmortems, runbook design, oncall good practices. Strong documentation and communication skills

Job Expectations:

  • Define and implement SLIs/SLOs and error budgets for critical services; drive SLO adoption across teams.

  • Build and tune observability (metrics/logs/traces) with golden signals (latency, traffic, errors, saturation).

  • Partner with Performance Engineering to run load/stress/soak tests and remove performance bottlenecks.

  • Platform & Automation: Eliminate toil , Generate AI based observability assessment and maturity score card for all applications

  • Create selfservice reliability tooling (runbooks, bots, reliability checks, golden paths).

  • Incident, Problem & Change

  • Lead high severity incidents (Major/SEV1), facilitate blameless postmortems, and track corrective actions.

  • Culture & Enablement: Coach product and ops teams on SRE principles; define maturity models and track adoption.

  • Build documentation: runbooks, dashboards, readiness checklists, and reliability reviews. always current.

Similar jobs

+ Search all jobs