1 of 5 free roles viewed today. Upgrade to premium for unlimited from only $19.99 with a 2-day free trial.

Data Engineer

⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for a Data Engineer with a 12+ month contract, located in Dallas, TX, or Charlotte, NC (Hybrid). Key skills include Python, SQL, Spark, and legacy ETL tools. Cloud experience, preferably GCP, is required.
🌎 - Country
United States
💱 - Currency
$ USD
💰 - Day rate
Unknown
Unknown
🗓️ - Date discovered
March 29, 2025
🕒 - Project duration
More than 6 months
🏝️ - Location type
Hybrid
📄 - Contract type
Unknown
🔒 - Security clearance
Unknown
📍 - Location detailed
Charlotte, NC
🧠 - Skills detailed
#Spark SQL #Batch #Spark (Apache Spark) #Python #Dremio #Oracle #Teradata #Data Pipeline #BigQuery #SQL (Structured Query Language) #Data Engineering #Databases #SSIS (SQL Server Integration Services) #AWS (Amazon Web Services) #GCP (Google Cloud Platform) #SQL Server #Ab Initio #"ETL (Extract #Transform #Load)" #Virtualization #Azure #Data Architecture #Informatica #Cloud
Role description
You've reached your limit of 5 free role views today.
Upgrade to premium for unlimited access - from only $19.99.

Required Skills: Python, ETL, Spark, SQL, AWS

Job Description

Job Title: Data Engineer

Role Type: 12+ Month

Location: Dallas TX, Charlotte NC (Hybrid onsite)

REQUIRED

  1. SQL

  1. Spark

  1. Python

  1. Legacy ETL tooling (SSIS preferred, Ab Initio, Informatica etc.)

  1. Cloud experience (AWS/GCP/Azure etc.)

Project Overview: Enables data for reporting and downstream consumption (both operational and analytics workstreams). Currently on a legacy platform and working to set up in Google Cloud.

Tech Requirements:

SQL

Python

Spark

Legacy ETL tooling (SSIS preferred, Ab Initio, Informatica)

Cloud experience (Cloud-native warehousing, GCP preferred, Azure/AWS okay.. Does not need a Cloud Data Engineer)

Work Overview:

Have over 150 data sources, mostly on-prem relational databases (SQL Server, some Oracle), Teradata, and some files.

Existing data pipelines are batch-driven, using SSIS, Ab Initio, and Informatica, and her existing team has experience in these tools. Refactoring existing data movements and ETL jobs into Python/Spark pipelines.

They do not own the Python/Spark framework and will not be making modifications to it, but adopting it for their data pipelines.

Majority of initial work will be around migrating SSIS packages to Spark, needs strong SQL skills.

In tandem, data architecture team will be setting up GCP environment, will eventually "reroute" pipelines to BigQuery/BigTable and introduce DataProc. Will be using Dremio or Starburst for virtualization, not finalized yet.

Following traditional medallion architecture, ingesting into bronze/raw layer, silver layer usage for more operational workflows, gold layer for reporting/analytics.

Mostly batch processing now but will get into event-driven architecture down the road a little.