

Data Engineer (Python, Scala, AWS) - W2 Only #953368
Data Engineer (Python, Scala, Pyspark, AWS) - W2 only #953368
•
• No subcontractors or C2C candidates, please
•
• Location: Chicago ( on W. Wacker near all trains for an easy commute) – Hybrid 3x/week onsite
Duration: 1 year+ with probable extension well into 2026 or conversion to Perm roles
Compensation: $62-$68/hr plus benefits and PTO
Benefits: For full-time contract employees benefits include medical, dental, and vision benefits, health saving accounts medical and dependent care flexible spending accounts as well as voluntary life and disability benefits and a 401k plan. PTO and sick time are also included for FTE contractors.
Key Responsibilities:
• Data Pipeline Development:
Design, develop, and maintain robust data pipelines using Scala and PySpark on AWS EMR to ingest, clean, transform, and load data from various sources (databases, APIs, flat files) into data lakes (S3) and data warehouses (Redshift).
• AWS Integration:
Utilize AWS services like S3, Glue, Lambda, and Kinesis to manage data storage, data processing, and real-time data streaming applications.
• Data Modeling:
Design data models and schema for data warehousing and data lakes to optimize data access and analysis.
• ETL/ELT Processes:
Develop complex ETL/ELT workflows using Python and PySpark to extract, transform, and load data across different systems.
• Performance Optimization:
Analyze and optimize data pipelines for performance and scalability, identifying bottlenecks and implementing improvements.
• Data Quality Assurance:
Implement data quality checks and validation procedures to ensure the accuracy and consistency of data throughout the pipeline.
• Collaboration:
Work closely with data analysts, business stakeholders, and other engineers to understand data requirements and translate them into technical solutions.
Required Skills:
• Programming Languages: Proficient in Scala, Python, and PySpark
• AWS Expertise: Deep understanding of AWS services including S3, EMR, Glue, Redshift, Kinesis, and Lambda
• Big Data Concepts: Familiarity with distributed computing concepts, data partitioning, and optimization techniques for large datasets
• SQL Proficiency: Strong SQL skills to query and manipulate data in relational databases
• Data Engineering Tools: Experience with data engineering tools like Apache Spark, Hive, and Airflow
• Version Control: Proficiency in Git for code management