

Senior Data Engineer
JD
We are seeking a skilled PySpark Engineer to join our data engineering team. The ideal candidate will have expertise in Apache Spark (PySpark) for large-scale data processing, strong SQL and Python skills, and experience working with big data platforms such as Hadoop, Databricks, or AWS Glue. This role involves building scalable ETL pipelines, optimizing data workflows, and ensuring data integrity for analytical and machine learning applications.
Key Responsibilities:
• Develop and optimize ETL pipelines using PySpark for data ingestion, transformation, and processing.
• Design, implement, and maintain big data solutions on platforms such as Hadoop, Databricks, or AWS Glue.
• Work with structured and unstructured data from multiple sources, ensuring data quality and consistency.
• Collaborate with data scientists, analysts, and business teams to understand data requirements and deliver efficient solutions.
• Optimize Spark jobs for performance tuning, cost reduction, and scalability.
• Implement data security and governance best practices.
• Troubleshoot issues related to data pipelines, Spark jobs, and distributed computing environments.
• Automate workflows using Apache Airflow, Cron Jobs, or other orchestration tools.
• Write clean, maintainable, and well-documented PySpark code following best practices.
• Stay up to date with the latest developments in big data, cloud computing, and distributed systems