EOS RPO

Senior Software Engineer-Platform engineering/application support,Ansible, Apigee, puppet

Posted Apr 17, 2026
Project ID: R-521638
Location
Bangalore, karnatka
Hours/week
40 hrs/week

Required Qualifications:

  • 4+ years of Software Engineering experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education

Desired Qualifications

  • 4+ years of software engineering experience

  • 4+ years of application production support experience

  • Education BS/BA degree or higher

  • Desired Qualifications

  • An industry-standard technology certification

  • Strong verbal, written, and interpersonal communication skills

  • 3+ years of experience with Cloud technologies    

  • Knowledge and understanding of Site Reliability Engineering (SRE) concepts         

  • 3+ years of Agile experience            

  • Advanced scripting skills specifically around automation, log rotation, data collection, error collection and alerting

  • Scripting and automation experience

  • Experience with complex business logic and dependencies

  • 3+ years of CI/CD automation and configuration experience (DevOps / pipeline automation)

  • 3+ years of experience with ITSM processes (e.g. Incident Management, Change Management, Asset Management and Configuration Management)  

  • Hands-on experience with writing / maintaining technical documentation such as fixlogs, runbooks, knowledge base, architectural diagrams

  • Hands-on experience with system administration across multiple platforms

  • Hands-on experience with one or more software development languages: Java, JavaScript, Ruby, Python, JSON, Angular, NodeJS, .Net/C#

  • Hands-on experience with one or more CI/CD automation tools: Jenkins, Gradle, Maven, Git, SonarQube, Artifactory, Ansible, Puppet, Apigee

  • Hands-on experience with one or more process management and scheduling tools: Autosys, JAWS

  • Hands-on experience with one or more Monitoring/Observability/APM/Analytics tools: Splunk, Elastic, Kibana, Grafana, Prometheus, AppDynamics, Dynatrace, New Relic, DataDog, Kafka, CloudWatch, Jaeger, Zipkin, Big Panda, TrueSight

  • Hands-on experience with one or more Server OS: Windows, Linux, Unix, Mainframe

  • Hands-on experience with one or more Cloud and virtualization technologies: Azure, GCP, AWS, PCF, PKS, Kubernetes, OpenShift, VMware

  • Hands-on experience with one or more Data storage, management and messaging technologies: Kafka, IBM MQ, Apache Airflow, Logstash, Spark, Oracle, SQL, MongoDB, Cassandra, Hadoop, Cloudera, AWS EMR, S3

  • Hands-on experience with one or more Testing Frameworks: Selenium, JMeter, Blazemeter, Performance Center, Perfecto, Cucumber, Gherkin, ALM, Gremlin, Chaos Monkey, Chaos Toolkit, Simian Army, Toxi Proxy

  • Working knowledge of TCP/IP networking, experience analyzing packet captures to assist in troubleshooting

  • Working knowledge of Internet technologies: routing, NAT, firewalls, load-balancing, proxies, web servers

JOB EXPECTATIONS:

  • Ability to work additional hours as needed

  • Ability to work on call as assigned

  • Flexibility to work in a 16/7 environment, including weekends and holidays

  • Operational Ownership / Application Support:

  • Maintain system operational knowledge (functional and technical)

  • Understand and monitor system operation, ensure optimal availability, functional health, and performance (driven by SLO/SLA)

  • Triage alerts, respond to incidents, perform root cause analysis (troubleshooting)

  • Handle users' questions and requests related to business systems (not a Desktop Support)

  • Change requests implementation (manual deployment steps, overall deployment coordination)

  • BCP planning and implementation

  • Ensure continuous improvements of operational processes and methods

  • Reliability Engineering:

  • Analyze system's monitoring and observability needs (technical, functional, business), and create or adjust logging, monitoring, alerting and analytics solutions to cover those needs

  • Use understanding of software engineering (system code) and infrastructure to improve the depth and quality of root cause analysis (troubleshooting)

  • Partner with Architecture, Infrastructure and Development teams to influence decisions that impact reliability and supportability

  • Identify routine or risky manual operations, and create automation solutions (scripting, tooling) or influence fixing the sources of manual work (as appropriate)

  • Drive deeper post-incident reviews for major incidents, to learn and improve

  • Engage in weakness research and analysis, and architectural reviews, to use deep knowledge of production operation to suggest improvements

  • Use deep knowledge of production operation to create detailed high-quality stories and tasks on DEV owners' backlog, with the focus on reliability and supportability

  • Ensure continuous improvements of systems' reliability and supportability

Similar jobs

+ Search all jobs