Data Engineer

eTeam

Location

United Kingdom

Duration

Unknown

Type

Hybrid

Salary

Not Stated

Role Description

Posted on:

Sep 12, 2024

Data Engineer

Solihull - Needs to be on site 2 days a week

Rate - 500 - 520/Day (inside IR35) (via Umbrella)

Key skills (most important) Spark, Airflow , Hadoop, Oozie, Docker, Kubernetes containerization, COS, Scala, SQL, Dremio, Git, GitLab, Jenkins, Bash

About the Role:

As a Data Engineer at BNP Paribas, you will be instrumental in designing, building, and optimizing the data processing pipelines in a centralized data platform "DATAHUB." This platform consolidates, federates, and enhances massive data assets for various use cases, including reporting, analytics, and machine learning. You will work with multiple data sources, ensuring the seamless integration, transformation, and quality of data, while also migrating Hadoop infrastructure to cloud environments.

Key Responsibilities:

Data Integration: Integrate data from multiple sources and formats into the Raw Layer of the DATAHUB.
Data Modeling and Pipeline Development: Model data and develop efficient data pipelines to enrich and transform large volumes of data with complex business rules, automating data pipelines and streamline data ingestion. Designing and implementing scalable and secure data processing pipelines using Scala, Spark, cloud Object storage, Hadoop, Cloud object storage
Data Transformation and Quality: Implement data transformation and quality control processes to ensure data consistency and accuracy. Utilize programming languages such as Scala and SQL and tools like Spark for data transformation and enrichment operations.
Scheduling with Airflow: Schedule data processing tasks using Airflow.
Validation Testing: Write and conduct unit and validation tests to ensure accuracy and integrity of code developed.
CI/CD Pipeline Implementation: Set up CI/CD pipelines to automate deployment, unit testing, and development management.
Documentation: Write technical documentation (specifications, operational documents) to ensure knowledge capitalization.
Code Improvement: understand existing code, modify the existing code as per business requirements and Continuously improve for better performance and maintainability and prepare relevant documentation.
Infrastructure Migration: Migrate the existing Hadoop infrastructure to cloud infrastructure on Kubernetes Engine, Object Storage, Spark as a service, and Airflow as a service
Performance Optimization and Security: Ensure the performance and security of the data infrastructure and follow the best practices of Data engineering.
Production Support and Maintenance: Contribute to production support, incident and anomaly correction, trouble shoot data related issues, and implement functional and technical evolutions to ensure the stability of production processes.
Team Collaboration: Work closely with data squads and business teams to understand data needs and provide tailored solutions.
Agile Experience – understand of agile principle and rituals
Software Development Lifecycle (SDLC) awareness