EOS RPO

Senior Systems Operations Engineer

Posted May 7, 2026
Project ID: R-538059
Location
Hyderabad, Telangana
Hours/week
40 hrs/week
Application Deadline: May 30, 2026 9:40 PM

In this role, you will:

  • Work on complex, broad impact initiatives including provision of high level systems consultation for the technology teams

  • Work as key participant in large scale planning of computer systems and network infrastructure for Systems Operations functional area

  • Review and analyze complex technical challenges, as well as escalated support issues related to core business solutions that require in depth evaluation of multiple factors, such as alternatives, enhancements, periodic systems reviews, or improvements to existing systems

  • Make decisions on technical changes and enhancements

  • Consult with engineering team on change design requiring solid understanding of technical process controls or standards that influence and drive new initiatives

  • Collaborate and consult with technical peers, colleagues, and mid to more experienced level managers to resolve systems support issues and achieve goals

Required Qualifications:

  • 4+ years of Systems Engineering, Technology Architecture experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education

Desired Qualifications:

  • Contribute to the execution of monitoring and observability initiatives across infrastructure and applications,

  • Design, build, and maintain dashboards, alerts, and telemetry pipelines using tools such as Grafana, Prometheus and Elastic APM, SPLoC.

  • Work with and support observability platforms including Splunk, AppDynamics, ThousandEyes, and ITRS Geneos.

  • Experience working in UNIX/Linux environments.

  • Collaborate closely with SRE and DevOps teams to support system reliability, scalability, and performance.

  • Develop and maintain automation scripts in Python and Shell for data collection, analysis, and alerting.

  • Participate in root cause analysis and incident response activities using observability data.

  • Contribute to the evaluation and adoption of Gen AI–based capabilities to improve observability, anomaly detection, and predictive insights.

  • Exposure to Gen AI / Agentic AI concepts: Basic familiarity or interest in Gen AI or agentic AI solutions, with some hands‑on exposure to areas such as prompt design, understanding agent/workflow concepts, and integrating AI solutions with applications or data sources.

  • Awareness of AI in observability use cases: Exposure to using AI or ML techniques with observability data (logs, metrics, traces), such as supporting anomaly detection, alert analysis, noise reduction, or assisting with issue diagnosis in production or pre‑production environments.

  • Exposure to operational use of AI solutions: Some experience or involvement in validating AI‑based solutions in real environments, learning how effectiveness is measured, iterating based on feedback, and working alongside existing observability or automation tools to improve reliability and efficiency.

  • Bachelor’s degree in computer science, Engineering, or a related field (or equivalent practical experience).

  • 5–7 years of experience in IT operations, monitoring, observability, or SRE-related roles.

  • Hands-on experience with tools such as: Splunk, ITRS Geneos, Grafana, Prometheus, Elastic APM, ThousandEyes, AppDynamics.

  • Working knowledge of scripting in: Python (including automation, data analysis, or basic ML/Gen AI integrations), Shell scripting

  • Good understanding of SRE fundamentals such as SLIs, SLOs, error budgets, and incident management.

  • Exposure to cloud platforms (AWS, Azure, or GCP) and containerized environments (Docker, Kubernetes).

  • Familiarity with CI/CD pipelines and infrastructure-as-code tools (Terraform, Ansible) is a plus.

  • Strong analytical and problem-solving skills with a data-driven mindset.

  • Clear communication skills and the ability to work effectively with cross-functional stakeholders.

Job Expectations:

  • The team operates on a 16x5 schedule, ensuring coverage across critical business hours and extended support windows.

  • Candidates must be willing to participate in weekend on-call rotations, providing support for high-priority incidents and system health checks.

  • As part of production management responsibilities, the lead is expected to be available during off-hours when necessary to support major incidents, deployments, or escalations.

  • Flexibility and responsiveness are key


Similar jobs

+ Search all jobs