EOS RPO
Senior Software Engineer
We are looking for a seasoned Data Engineer to join our Data Engineering & Analytics team. The ideal candidate is a powerhouse in big data processing, specializing in the Python/Spark ecosystem and modern table formats like Apache Iceberg.
You will be responsible for building scalable ETL pipelines, migrating legacy workloads to the cloud, and ensuring high-performance data delivery across our Azure or GCP environments.
### Key ResponsibilitiesData Pipeline Development: Design, develop, and maintain complex ETL/ELT pipelines using Python and PySpark to process massive datasets.
Modern Data Architecture: Implement and optimize data storage solutions using Apache Iceberg and Hive to ensure ACID compliance and high-performance querying.
Cloud Transformation: Architect and deploy data solutions on cloud platforms (Azure or GCP), leveraging native services for storage and compute.
Legacy Integration & Migration: Integrate diverse data sources including Oracle, Teradata, and traditional SQL databases.
Orchestration: Automate and monitor complex workflows using enterprise-grade scheduling tools like Autosys or Airflow.
Semantic Layering: Develop and maintain Semantic models to ensure data consistency and ease of use for downstream BI and Analytics teams.
Optimization: Perform deep-dive performance tuning on SQL queries and Spark jobs to reduce latency and cloud consumption costs.
Must-Have Expertise:
Programming & Processing: Expert-level Python and Apache Spark (PySpark/Spark SQL).
Table Formats & Metadata: Hands-on experience with Apache Iceberg and Hive (Metastore, partitioning, and optimization).
Advanced SQL: Mastery of complex SQL across multiple dialects (PostgreSQL, T-SQL, etc.).
Cloud Platforms: Proven experience in either Microsoft Azure (ADLS, Databricks, Synapse) or Google Cloud Platform (BigQuery, Dataproc, GCS).
Orchestration: Proficiency in Autosys or Apache Airflow for job scheduling and dependency management.
Highly Desirable (Good to Have):
Legacy ETL Tools: Experience with Ab Initio is a significant plus for legacy system migrations.
Enterprise Warehousing: Working knowledge of Teradata or large-scale Oracle environments.
Experience: 6+ years in Data Engineering or Big Data development.
Database Knowledge: Deep understanding of relational (RDBMS) vs. non-relational data structures.
Problem Solving: Ability to debug distributed systems and optimize data sharding/partitioning strategies.
Best Practices: Strong grasp of CI/CD, version control (Git), and data validation frameworks.