EOS RPO
Site Reliability Engineer (2324847)
Job Title : Site Reliability Engineer (2324847)
What You’ll Do (3 to 5 brief pointers about the roles and responsibility)
Programming/Tooling and Automation experience in one or more of the following languages: Golang, Java, Python, Typescript, Node and Shell . Good understanding of Kafka internals , SQL/noSQL databases like Cassandra , Elasticsearch and Postgress and In-Memory Caching frameworks like Memcached . Influence, design and create new architectures, standards, and methods for large-scale enterprise systems. Design, write and build tools to improve the reliability, latency, availability and scalability of Walmart e-commerce/Retail and Enterprise products.
Engender reliability and availability starting with metrics and measurements.
Enable scaling by providing tools, developing training and/or augmenting processes.
Build tools/automate to prevent re-occurrence of problem to mission critical products/services.
Participate in capacity planning, demand forecasting, software performance analysis and system tuning. Engage with enterprise and business/infrastructure functions to establish, track, and optimize operational metrics and targets in line with SRE principles (SLO/SLI, Latency percentiles , error budgets, tech debt and setup alert guidelines )
What You’ll bring (3 to 5 brief pointers about the qualifications, exposures and experiences required for the role)
Bachelor's Degree or Master’s Degree with 6+ years of experience in Computer Science or related field. Proficiency in any of the programming languages like Java, GoLang, etc Experience in designing, investigating, analysing, and troubleshooting large-scale enterprise systems. Methodical and systematic problem-solving approach, combined with a solid awareness of ownership, initiative, and drive. Fluency with running services at scale; In depth understanding of Unix systems internals and networking. Experience with IaaS and PaaS providers such as AWS, AZURE OpenStack, GCP Experience with containerisation and container platforms. (e.g., Docker, Kubernetes, Docker EE, OpenShift, Mesosphere). Experience with enterprise monitoring solutions like AppDynamics, New Relic, Prometheus, Graphite, Grafana, Nagios, Sensu and Splunk