Refer a freelancer, and you both get 1 free week of DFH Premium. They must use your code {code} at sign-up. More referrals = more free weeks! T&Cs apply.
1 of 5 free roles viewed today. Upgrade to premium for unlimited.

Senior Data Engineer

This role is for a Senior Data Engineer with a contract length of "Unknown," offering a pay rate of "Unknown." Key skills include PySpark, SQL, Python, and experience with Hadoop, Databricks, or AWS Glue. Strong ETL and data integrity expertise required.
🌎 - Country
United States
💱 - Currency
$ USD
💰 - Day rate
Unknown
Unknown
🗓️ - Date discovered
February 8, 2025
🕒 - Project duration
Unknown
🏝️ - Location type
Unknown
📄 - Contract type
Unknown
🔒 - Security clearance
Unknown
📍 - Location detailed
Phoenix, AZ
🧠 - Skills detailed
#"ETL (Extract #Transform #Load)" #Data Engineering #Distributed Computing #Data Ingestion #Airflow #Scala #Security #Databricks #Hadoop #Data Pipeline #Data Processing #Big Data #PySpark #Data Quality #AWS Glue #Apache Spark #AWS (Amazon Web Services) #Cloud #Data Integrity #Data Security #SQL (Structured Query Language) #Data Science #Spark (Apache Spark) #ML (Machine Learning) #Python #Apache Airflow
Role description
You've reached your limit of 5 free role views today. Upgrade to premium for unlimited access.

JD

We are seeking a skilled PySpark Engineer to join our data engineering team. The ideal candidate will have expertise in Apache Spark (PySpark) for large-scale data processing, strong SQL and Python skills, and experience working with big data platforms such as Hadoop, Databricks, or AWS Glue. This role involves building scalable ETL pipelines, optimizing data workflows, and ensuring data integrity for analytical and machine learning applications.

Key Responsibilities:
• Develop and optimize ETL pipelines using PySpark for data ingestion, transformation, and processing.
• Design, implement, and maintain big data solutions on platforms such as Hadoop, Databricks, or AWS Glue.
• Work with structured and unstructured data from multiple sources, ensuring data quality and consistency.
• Collaborate with data scientists, analysts, and business teams to understand data requirements and deliver efficient solutions.
• Optimize Spark jobs for performance tuning, cost reduction, and scalability.
• Implement data security and governance best practices.
• Troubleshoot issues related to data pipelines, Spark jobs, and distributed computing environments.
• Automate workflows using Apache Airflow, Cron Jobs, or other orchestration tools.
• Write clean, maintainable, and well-documented PySpark code following best practices.
• Stay up to date with the latest developments in big data, cloud computing, and distributed systems