Scala Developer
Job Description
We are seeking a highly skilled individual with expertise in building technical strategy, schema design, and solution development for survey databases, from raw input to analytics-ready output. The ideal candidate will demonstrate leadership and technical proficiency in managing end-to-end data processes, leveraging cloud services, and ensuring data quality. Below are the detailed requirements:
Key Responsibilities:
Database Design and Technical Strategy:
• Develop schema designs and relationships for the entire survey database lifecycle:
• Raw data layer schema and entity relationships.
• Structured respondent-level schema adhering to best data management practices (e.g., star schema, snowflake schema, survey-specific modifications).
• Output data schema reflecting dimensionality and metric differentiation principles to meet analytics requirements.
Data Ingestion, Processing, and Transformations:
• Manage raw, unstructured data ingestion through APIs, data dumps, and other methods.
• Parse and transform raw data into structured tabular formats.
• Perform data aggregations and transformations to create analytics-ready datasets.
• Employ a modular scripting approach featuring parameterization and reusable functions for fast and reliable modifications.
Quality Assurance Frameworks:
• Design and implement quality assurance frameworks, processes, and rules for survey data.
• Automate QA rules to monitor data availability, consistency, integrity, and adherence to business logic.
• Develop an evolving QA monitoring and alert system tied to specific SOPs.
Cloud Services Expertise:
• Extensive knowledge of Azure Cloud services for data and analytics use cases.
• Experience with Azure Synapse components, including ADF, Workspaces, Scripts, and Databases.
• Proficiency with relational SQL databases in the cloud.
• Hands-on experience with Logic Apps, Storage Accounts, and other cloud orchestration services.
• Manage and optimize compute and storage allocation to achieve cost efficiency.
Databricks Platform:
• Proficient in data management, cluster management, notebooks, and orchestration within Databricks.
• Extensive programming experience with Python and/or Scala and SQL at an enterprise scale.
• Develop and deploy functional, modular, and efficient code emphasizing clarity, brevity, and speed.
ETL and Pipelines:
• Create, manage, and deploy Data Factory pipelines, including parameterization of inputs/outputs to simplify and optimize ETL processes.
• Orchestrate pipelines to ensure uniqueness and continuity of ETL operations.
• Build and manage CI/CD pipelines.
Version Control and Deployment:
• Experience with GitHub for version control, including branch management (Local/Dev/Test/Prod).
• Build and deploy ETL code and data object artifacts using GitHub integrated with the Azure Cloud environment.
Other Skills and Knowledge:
• Knowledge of Dynamics 365 and experience operating services using Dynamics ticketing and CRM management systems.