

Data Engineer
Required Skills: Python, ETL, Spark, SQL, AWS
Job Description
Job Title: Data Engineer
Role Type: 12+ Month
Location: Dallas TX, Charlotte NC (Hybrid onsite)
REQUIRED
- SQL
- Spark
- Python
- Legacy ETL tooling (SSIS preferred, Ab Initio, Informatica etc.)
- Cloud experience (AWS/GCP/Azure etc.)
Project Overview: Enables data for reporting and downstream consumption (both operational and analytics workstreams). Currently on a legacy platform and working to set up in Google Cloud.
Tech Requirements:
SQL
Python
Spark
Legacy ETL tooling (SSIS preferred, Ab Initio, Informatica)
Cloud experience (Cloud-native warehousing, GCP preferred, Azure/AWS okay.. Does not need a Cloud Data Engineer)
Work Overview:
Have over 150 data sources, mostly on-prem relational databases (SQL Server, some Oracle), Teradata, and some files.
Existing data pipelines are batch-driven, using SSIS, Ab Initio, and Informatica, and her existing team has experience in these tools. Refactoring existing data movements and ETL jobs into Python/Spark pipelines.
They do not own the Python/Spark framework and will not be making modifications to it, but adopting it for their data pipelines.
Majority of initial work will be around migrating SSIS packages to Spark, needs strong SQL skills.
In tandem, data architecture team will be setting up GCP environment, will eventually "reroute" pipelines to BigQuery/BigTable and introduce DataProc. Will be using Dremio or Starburst for virtualization, not finalized yet.
Following traditional medallion architecture, ingesting into bronze/raw layer, silver layer usage for more operational workflows, gold layer for reporting/analytics.
Mostly batch processing now but will get into event-driven architecture down the road a little.
Required Skills: Python, ETL, Spark, SQL, AWS
Job Description
Job Title: Data Engineer
Role Type: 12+ Month
Location: Dallas TX, Charlotte NC (Hybrid onsite)
REQUIRED
- SQL
- Spark
- Python
- Legacy ETL tooling (SSIS preferred, Ab Initio, Informatica etc.)
- Cloud experience (AWS/GCP/Azure etc.)
Project Overview: Enables data for reporting and downstream consumption (both operational and analytics workstreams). Currently on a legacy platform and working to set up in Google Cloud.
Tech Requirements:
SQL
Python
Spark
Legacy ETL tooling (SSIS preferred, Ab Initio, Informatica)
Cloud experience (Cloud-native warehousing, GCP preferred, Azure/AWS okay.. Does not need a Cloud Data Engineer)
Work Overview:
Have over 150 data sources, mostly on-prem relational databases (SQL Server, some Oracle), Teradata, and some files.
Existing data pipelines are batch-driven, using SSIS, Ab Initio, and Informatica, and her existing team has experience in these tools. Refactoring existing data movements and ETL jobs into Python/Spark pipelines.
They do not own the Python/Spark framework and will not be making modifications to it, but adopting it for their data pipelines.
Majority of initial work will be around migrating SSIS packages to Spark, needs strong SQL skills.
In tandem, data architecture team will be setting up GCP environment, will eventually "reroute" pipelines to BigQuery/BigTable and introduce DataProc. Will be using Dremio or Starburst for virtualization, not finalized yet.
Following traditional medallion architecture, ingesting into bronze/raw layer, silver layer usage for more operational workflows, gold layer for reporting/analytics.
Mostly batch processing now but will get into event-driven architecture down the road a little.