

Spark Developer / Engineer
Job Title: Spark Developer / Engineer (2 positions)
Location: US Remote, work during PST time zone
Duration: 6-12 Months
Rate: DOE
US Citizen and Green Card only No Third-party agencies Corp to Corp
Workflows are powered by offline batch jobs written in Scalding, a MapReduce-based framework. To enhance scalability and performance, migrating these jobs from Scalding to Apache Spark.
Key Responsibilities:
Understanding the Existing Scalding Codebase
Analyze the current Scaling-based data pipelines.
Document existing business logic and transformations.
Migrating the Logic to Spark
Convert existing Scaling jobs into Spark (Py Spark/Scala) while ensuring optimized performance.
Refactor data transformations and aggregations in Spark.
Optimize Spark jobs for efficiency and scalability.
Ensuring Data Parity & Validation
Develop data parity tests to compare outputs between Scalding and Spark implementations.
Identify and resolve any discrepancies between the two versions.
Work with stakeholders to validate correctness.
Writing Unit Tests & Improving Code Quality
Implement robust unit and integration tests for Spark jobs.
Ensure code meets engineering best practices (modular, reusable, and well-documented).
Required Qualifications:
Experience in big data processing with Apache Spark (PySpark or Scala).
Strong experience with data migration from legacy systems to Spark.
Proficiency in Scalding and MapReduce frameworks.
Experience with Hadoop, Hive, and distributed data processing.
Hands-on experience in writing unit tests for Spark pipelines.
Strong SQL and data validation experience.
Proficiency in Python, Scala
Knowledge of CI/CD pipelines for data jobs.
Familiarity with Apache Airflow orchestration tool.