Refer a freelancer, and you both get 1 free week of DFH Premium. They must use your code {code} at sign-up. More referrals = more free weeks! T&Cs apply.
1 of 5 free roles viewed today. Upgrade to premium for unlimited.

Data Engineer

This role is for a Data Engineer with 12+ years of experience in CDC, Apache Spark, and AWS. It’s an onsite/hybrid position in Plano, Texas, offering a competitive pay rate. Key skills include Java, Python (Pyspark), and AWS services.
🌎 - Country
United States
💱 - Currency
$ USD
💰 - Day rate
Unknown
Unknown
🗓️ - Date discovered
February 14, 2025
🕒 - Project duration
Unknown
🏝️ - Location type
Hybrid
📄 - Contract type
Unknown
🔒 - Security clearance
Unknown
📍 - Location detailed
Plano, TX
🧠 - Skills detailed
#Data Transformations #Databases #SQL (Structured Query Language) #Lambda (AWS Lambda) #AWS (Amazon Web Services) #PySpark #Data Processing #"ETL (Extract #Transform #Load)" #AWS Glue #Airflow #Data Architecture #AWS S3 (Amazon Simple Storage Service) #Apache Airflow #Python #AWS Lambda #S3 (Amazon Simple Storage Service) #Data Pipeline #Java #Apache Spark #Scala #Data Engineering #Computer Science #Data Catalog #Cloud #Spark (Apache Spark) #Spark SQL #Batch #AWS EMR (Amazon Elastic MapReduce) #Datasets #Big Data #Data Lake
Role description
You've reached your limit of 5 free role views today. Upgrade to premium for unlimited access.

Job Title: Data Engineer (CDC, Apache Spark, ETL, AWS)

Client: KFroce

Location: Plano, Texas And Reston, VA (Candidates must be local to Texas or Virginia)

Type: Onsite/Hybrid Opportunity

Job Summary:

We are seeking a highly skilled Data Engineer with expertise in Change Data Capture (CDC) and data pipeline development to join our team at KFroce. The ideal candidate will have experience setting up and managing CDC for multiple types of databases to hydrate a data lake, along with proficiency in building ETL transformations using Apache Spark. This role requires a solid understanding of both batch and streaming data pipelines, as well as hands-on experience with data processing, optimization, and performance tuning in a Big Data environment. Familiarity with AWS services and cloud-based data architectures is essential. This is an onsite and hybrid opportunity, and candidates must be local to Texas.

Key Responsibilities:
• Design and implement Change Data Capture (CDC) solutions using Debezium or other CDC tools for various databases.
• Build and maintain data pipelines for streaming and batch processing with Apache Spark using DataFrames, Spark SQL, and Spark Streaming.
• Perform data transformations and develop ETL jobs to ensure efficient data movement and integration into a data lake.
• Collaborate with data teams to design scalable, optimized solutions for large-scale data processing.
• Work with Apache Airflow to orchestrate data pipelines and automate workflows.
• Utilize AWS cloud services to build robust and scalable data pipelines.
• Work with AWS services like S3, EMR, Glue Data Catalog, Step Functions, Lambda, MWAA, and AWS Batch to optimize data workflows.
• Troubleshoot performance issues and optimize the processing of large datasets to ensure high-performance ETL workflows.
• Keep up to date with emerging technologies in Big Data and cloud services.

Skills & Qualifications:

Technical Skills:
• Java: Mid to senior level proficiency in Java.
• Python (Pyspark): Mid-level experience working with Python and Pyspark for data processing.
• Apache Spark: Strong experience with Spark DataFrames, Spark SQL, Spark Streaming, and building ETL pipelines.
• Apache Airflow: Experience in managing and automating workflows using Apache Airflow.
• Big Data Concepts: Understanding of performance tuning and optimization in large-scale data processing environments.
• Scala (Optional): Familiarity with Scala is a plus.
• Apache Hudi & Apache Griffin (Optional): Knowledge of Apache Hudi or Apache Griffin is a plus.

AWS Services:
• Extensive knowledge of AWS S3, including CRUD operations.
• Experience with AWS EMR & EMR Serverless.
• Familiarity with AWS Glue Data Catalog.
• Knowledge of AWS Step Functions for orchestration.
• Experience with AWS MWAA (Managed Workflows for Apache Airflow).
• Proficient in AWS Lambda (Python).
• Experience with AWS Batch for running jobs.
• Familiarity with AWS Deequ (optional).

Desired Experience:
• 12+ years of experience in data engineering or related roles, with hands-on experience in CDC, Apache Spark, and AWS-based data pipeline development.
• Familiarity with big data tools and techniques for processing and optimizing large datasets.
• Education & Certifications:
• Bachelor’s degree in Computer Science, Engineering, or related field.
• Relevant certifications (AWS, Apache Spark) are a plus.

Location:

Plano, Texas (Candidates must be local to Texas; onsite/hybrid opportunity available)