EOS RPO
System Engineer
Production Support & ITIL Management
Incident Management: Lead the technical response to high-priority incidents, ensuring restoration of service within defined SLAs and driving post-incident reviews.
Problem Management: Perform Root Cause Analysis (RCA) to identify recurring issues and implement permanent fixes.
Change & Release: Manage Application Deployments in production environments, ensuring all changes follow ITIL governance to minimize risk.
2. Observability & Log Monitoring
Splunk Engineering: Create and optimize complex Splunk queries, alerts, and dashboards to monitor application health and security logs.
Network & Synthetics: Utilize ThousandEyes to monitor network paths and end-to-end user experience, identifying bottlenecks outside the immediate internal network.
Proactive Monitoring: Transform raw log data into actionable insights to predict system failures before they impact the business.
3. Automation & Workload Orchestration
Scripting: Develop and maintain Shell Scripts (Bash/Korn) to automate routine maintenance tasks, log rotations, and health checks.
Job Scheduling: Design, implement, and troubleshoot complex batch job workflows using Autosys. Ensure job dependencies and schedules are optimized for system resources.
Deployment Automation: Assist in the automation of deployment pipelines to ensure repeatable and error-free releases.