

Lead Data Engineer_San Francisco, CA_Only on W2_No C2C/1099
Responsibilities
Development Tasks:
• Collect metrics based on user interactions.
• Visualize data for business teams.
• Develop and redesign data pipelines using Kafka streams.
• Implement solutions using Spring Boot Java and Databricks Spark streaming.
Leadership Duties
• Lead the measurement processes from requirements gathering to production delivery.
• Collaborate with other team leads, business partners, and product managers.
• Balance between hands-on engineering (50%) and team leadership (50%).
Collaboration Structure
• Onsite: Lead role (this resource)
• Nearshore: Senior developer.
• Offshore: Data engineer role.
Lead Data Engineer - Job Description
Required Skills & Experience:
• Hands-on code mindset with deep understanding in technologies / skillset and an ability to understand larger picture.
• Sound knowledge to understand Architectural Patterns, best practices and Non-Functional Requirements
• Overall, 8-10 years of experience in heavy volume data processing, data platform, data lake, big data, data warehouse, or equivalent.
• 5+ years of experience with strong proficiency in Python and Spark (must-have).
• 3+ years of hands-on experience in ETL workflows using Spark and Python.
• 4+ years of experience with large-scale data loads, feature extraction, and data processing pipelines in different modes – near real time, batch, realtime.
• Solid understanding of data quality, data accuracy concepts and practices.
• 3+ years of solid experience in building and deploying ML models in a production setup. Ability to quickly adapt and take care of data preprocessing, feature engineering, model engineering as needed.
• Preferred: Experience working with Python deep learning libraries like any or more than one of these - PyTorch, Tensorflow, Keras or equivalent.
• Preferred: Prior experience working with LLMs, transformers. Must be able to work through all phases of the model development as needed.
• Experience integrating with various data stores, including:
• SQL/NoSQL databases
• In-memory stores like Redis
• Data lakes (e.g., Delta Lake)
• Experience with Kafka streams, producers & consumers.
• Required: Experience with Databricks or similar data lake / data platform.
• Required: Java and Spring Boot experience with respect to data processing - near real time, batch based.
• Familiarity with notebook-based environments such as Jupyter Notebook.
• Adaptability: Must be open to learning new technologies and approaches.
• Initiative: Ability to take ownership of tasks, learn independently, and innovate.
• With technology landscape changing rapidly, ability and willingness to learn new technologies as needed and produce results on job.
Preferred Skills:
• Ability to pivot from conventional approaches and develop creative solutions.
Responsibilities
Development Tasks:
• Collect metrics based on user interactions.
• Visualize data for business teams.
• Develop and redesign data pipelines using Kafka streams.
• Implement solutions using Spring Boot Java and Databricks Spark streaming.
Leadership Duties
• Lead the measurement processes from requirements gathering to production delivery.
• Collaborate with other team leads, business partners, and product managers.
• Balance between hands-on engineering (50%) and team leadership (50%).
Collaboration Structure
• Onsite: Lead role (this resource)
• Nearshore: Senior developer.
• Offshore: Data engineer role.
Lead Data Engineer - Job Description
Required Skills & Experience:
• Hands-on code mindset with deep understanding in technologies / skillset and an ability to understand larger picture.
• Sound knowledge to understand Architectural Patterns, best practices and Non-Functional Requirements
• Overall, 8-10 years of experience in heavy volume data processing, data platform, data lake, big data, data warehouse, or equivalent.
• 5+ years of experience with strong proficiency in Python and Spark (must-have).
• 3+ years of hands-on experience in ETL workflows using Spark and Python.
• 4+ years of experience with large-scale data loads, feature extraction, and data processing pipelines in different modes – near real time, batch, realtime.
• Solid understanding of data quality, data accuracy concepts and practices.
• 3+ years of solid experience in building and deploying ML models in a production setup. Ability to quickly adapt and take care of data preprocessing, feature engineering, model engineering as needed.
• Preferred: Experience working with Python deep learning libraries like any or more than one of these - PyTorch, Tensorflow, Keras or equivalent.
• Preferred: Prior experience working with LLMs, transformers. Must be able to work through all phases of the model development as needed.
• Experience integrating with various data stores, including:
• SQL/NoSQL databases
• In-memory stores like Redis
• Data lakes (e.g., Delta Lake)
• Experience with Kafka streams, producers & consumers.
• Required: Experience with Databricks or similar data lake / data platform.
• Required: Java and Spring Boot experience with respect to data processing - near real time, batch based.
• Familiarity with notebook-based environments such as Jupyter Notebook.
• Adaptability: Must be open to learning new technologies and approaches.
• Initiative: Ability to take ownership of tasks, learn independently, and innovate.
• With technology landscape changing rapidly, ability and willingness to learn new technologies as needed and produce results on job.
Preferred Skills:
• Ability to pivot from conventional approaches and develop creative solutions.