EOS RPO
Sr Software Engineer
Sr Software Engineer
Role Objective
As a Senior Data Engineer, you will design, build, and optimize large-scale data pipelines that bridge the gap between legacy on-prem systems and modern cloud environments. You will be responsible for implementing high-performance storage layers using Apache Iceberg and ensuring data consistency across Hive and cloud-native platforms (Azure or GCP).
Key Responsibilities1. Data Pipeline Development (ETL/ELT)
Design and implement complex ETL workflows using Python and PySpark to process petabytes of data.
Build scalable data ingestion pipelines from diverse sources, including Oracle, SQL Server, and Teradata.
Manage job scheduling and workflow orchestration using Airflow or Autosys to ensure 24/7 data availability.
2. Data Lakehouse Engineering (Iceberg & Hive)
Must-Have: Architecture and management of Apache Hive metastores and tables.
Implement Apache Iceberg for ACID transactions, time travel, and schema evolution on the data lake.
Optimize storage formats and partitioning strategies to reduce cloud compute costs and improve query performance.
3. Semantic Layer & Modeling
Develop and maintain the Semantic Layer to provide business-ready data abstractions for BI tools.
Translate complex business logic into efficient SQL queries and Spark transformations.
Ensure data quality and lineage are maintained throughout the pipeline from source to consumption.
4. Cloud Infrastructure & Modernization
Deploy and manage data resources within Azure (ADLS, Synapse, Databricks) or GCP (BigQuery, GCS, Dataproc).
Lead the migration of legacy ETL logic (e.g., Ab Initio) into modern Spark-based cloud frameworks.
Languages: Python (Advanced), SQL (Expert).
Big Data Frameworks: Apache Spark (PySpark/Scala), Hive (Internal/External tables).
Table Formats: Apache Iceberg (Hands-on experience with snapshots and manifests).
Database Systems: Experience with RDBMS (Oracle, SQL) and MPP (Teradata).
Orchestration: Airflow (Preferred) or Autosys.
Cloud: Hands-on experience in Azure or GCP.
Legacy Tools: Knowledge of Ab Initio is a significant plus for migration projects.