Refer a freelancer, and you both get 1 free week of DFH Premium. They must use your code {code} at sign-up. More referrals = more free weeks! T&Cs apply.
1 of 5 free roles viewed today. Upgrade to premium for unlimited.

Spark Developer / Engineer

This role is for a Spark Developer/Engineer, offering a 6-12 month contract, remote work during PST hours. Required skills include Apache Spark (PySpark/Scala), Scalding, Hadoop, SQL, and data migration experience. US Citizens and Green Card holders only.
🌎 - Country
United States
💱 - Currency
$ USD
💰 - Day rate
Unknown
Unknown
🗓️ - Date discovered
February 15, 2025
🕒 - Project duration
More than 6 months
🏝️ - Location type
Remote
📄 - Contract type
Unknown
🔒 - Security clearance
Unknown
📍 - Location detailed
United States
🧠 - Skills detailed
#Big Data #Python #Apache Spark #Apache Airflow #Data Migration #Scala #Batch #Spark (Apache Spark) #Data Transformations #"ETL (Extract #Transform #Load)" #PySpark #Data Processing #Hadoop #SQL (Structured Query Language) #Data Pipeline #Migration #Airflow
Role description
You've reached your limit of 5 free role views today. Upgrade to premium for unlimited access.

Job Title: Spark Developer / Engineer (2 positions)

Location: US Remote, work during PST time zone

Duration: 6-12 Months

Rate: DOE

US Citizen and Green Card only No Third-party agencies Corp to Corp

Workflows are powered by offline batch jobs written in Scalding, a MapReduce-based framework. To enhance scalability and performance, migrating these jobs from Scalding to Apache Spark.

Key Responsibilities:

Understanding the Existing Scalding Codebase

Analyze the current Scaling-based data pipelines.

Document existing business logic and transformations.

Migrating the Logic to Spark

Convert existing Scaling jobs into Spark (Py Spark/Scala) while ensuring optimized performance.

Refactor data transformations and aggregations in Spark.

Optimize Spark jobs for efficiency and scalability.

Ensuring Data Parity & Validation

Develop data parity tests to compare outputs between Scalding and Spark implementations.

Identify and resolve any discrepancies between the two versions.

Work with stakeholders to validate correctness.

Writing Unit Tests & Improving Code Quality

Implement robust unit and integration tests for Spark jobs.

Ensure code meets engineering best practices (modular, reusable, and well-documented).

Required Qualifications:

Experience in big data processing with Apache Spark (PySpark or Scala).

Strong experience with data migration from legacy systems to Spark.

Proficiency in Scalding and MapReduce frameworks.

Experience with Hadoop, Hive, and distributed data processing.

Hands-on experience in writing unit tests for Spark pipelines.

Strong SQL and data validation experience.

Proficiency in Python, Scala

Knowledge of CI/CD pipelines for data jobs.

Familiarity with Apache Airflow orchestration tool.