1 of 5 free roles viewed today. Upgrade to premium for unlimited.

Data Engineer

This role is for a Data Engineer with a contract length of "unknown," offering a pay rate of "$X per hour." Key skills include Java, Python (PySpark), Apache Spark, Apache Airflow, and AWS services. Industry experience in CDC is required.

🌎 - Country

United States

💱 - Currency

$ USD

💰 - Day rate

Unknown

🗓️ - Date discovered

February 18, 2025

🕒 - Project duration

Unknown

🏝️ - Location type

Unknown

📄 - Contract type

Unknown

🔒 - Security clearance

Unknown

📍 - Location detailed

Reston, VA

🧠 - Skills detailed

#Data Ingestion #Big Data #S3 (Amazon Simple Storage Service) #Data Processing #"ACID (Atomicity #Consistency #Isolation #Durability)" #Batch #SQL (Structured Query Language) #Spark (Apache Spark) #Apache Spark #Java #Data Lake #AWS Lambda #Python #Data Catalog #Spark SQL #Storage #Data Engineering #Datasets #"ETL (Extract #Transform #Load)" #Airflow #Data Quality #Data Pipeline #AWS (Amazon Web Services) #PySpark #Lambda (AWS Lambda) #Scala #Apache Airflow #Databases

Role description

You've reached your limit of 5 free role views today. Upgrade to premium for unlimited access.

Role: Data Engineer

We are seeking a highly skilled Data Engineer to set up Change Data Capture (CDC) for multiple database types to support data lake hydration. The ideal candidate should have hands-on experience with Debezium or other CDC frameworks and strong expertise in ETL transformations using Apache Spark for both streaming and batch data processing.

Key Responsibilities
• Implement Change Data Capture (CDC) for diverse databases to enable real-time and batch data ingestion.
• Develop ETL pipelines using Apache Spark (PySpark/Java) to transform raw CDC data into structured, analytics-ready datasets.
• Work with Apache Spark DataFrames, Spark SQL, and Spark Streaming to build scalable data pipelines.
• Optimize data workflows for performance, reliability, and scalability in a big data environment.
• Utilize Apache Airflow to orchestrate data pipelines and schedule workflows.
• Leverage AWS services for data ingestion, storage, transformation, and processing (e.g., S3, Glue, EMR, Lambda, Step Functions, MWAA).

Required Skills
• Java: Mid to senior-level experience.
• Python (PySpark): Mid-level experience.
• Apache Spark: Proficiency in DataFrames, Spark SQL, Spark Streaming, and ETL pipelines.
• Apache Airflow: Experience managing and scheduling workflows.
• AWS Expertise:
• S3 (CRUD operations)
• EMR & EMR Serverless
• Glue Data Catalog
• Step Functions
• MWAA (Managed Workflows for Apache Airflow)
• AWS Lambda (Python-based)
• AWS Batch

Nice-to-Have Skills (Bonus)
• Scala for Spark development.
• Apache Hudi for incremental data processing and ACID transactions.
• Apache Griffin for data quality and validation.
• Performance tuning and optimization in big data environments.
• AWS Deequ - not required, but a plus

Apply now Try premium

 See all roles

Go to role

Teradata SQL Developer with Strong Experience with IBM Mainframe[Only W2]

This role is for a Teradata SQL Developer with strong IBM Mainframe experience, requiring proficiency in Teradata SQL, JCL, CA7, and Linux/Unix scripting. Contract length is unspecified; pay rate is W2. Key skills include Agile methodology and preferred knowledge of Python and JavaScript.

🌎 - Country

Data Engineer

Premium Members Land Roles Faster—Claim Your 7 Day Free Trial to Start.

Teradata SQL Developer with Strong Experience with IBM Mainframe[Only W2]

Database Developer

Dev Ops Engineer

Artificial Intelligence (AI) Engineer

Premium Members Land Roles Faster—Claim Your 7 Day Free Trial to Start.