IT Data/Analytics Engineer
The IT Data/Analytics Engineer position requires experience designing, developing, testing and deploying efficient and scalable AI/ML solutions for life sciences data and analytics.
This position requires strong AWS SageMaker AI/ML skills to include supporting tools such as Dask, Athena, AWS Glue, Redshift and EMR, to build data and analytical insights into Company data sets.
Focus on data and analytics supporting Translational Research, Chemistry, Manufacturing and Controls (CMC) to include data collection and analysis tools to aggregate research, clinical, and manufacturing data sets.
Working in development environments to include Python, Pandas, SQL, Jupyter Notebooks, JupyterLab, MongoDB and Posit.
Support data science colleagues in provisioning curated data sets for the creation of data visualization dashboards in Tableau.
The position requires enthusiasm, passion, attention to detail and a desire to create new medicines for Company’s patients.
Liaise with data scientists, data engineers, and cloud solution hosting partners to implement data driven AI/ML solutions for drug discovery analytics and patient cohort development.
Work with large clinical real-world evidence data sets in parquet or JSON formats.
Creation of Python scripts and algorithms to clean, transform and extract data from multiple sources and providing these outputs to the Data Science Team.
Computational requirements gathering and collaboration with Company’s AWS cloud provider on sizing solutions and performance tuning or Notebooks, EC2 and other storage solutions.
Conduct regular code reviews with data engineers and data scientists. Management of GitHub code repositories.
Closely coordinate with the Director of Data Science on the creation and management of agile software sprint planning and release management.
Requirements Description
Education/Training Advanced degree in a quantitative field such as Management Information Systems, Computer Science, Machine Learning or equivalent experience.
Experience 3-5 years of experience designing and building AI solutions
Licenses
Skills/Abilities : Relational and non-relational database experience with the proven ability to model, design, and optimize data structures.
Expert knowledge of Structured Query Language (SQL) with the proven ability to author and optimize complex queries is required.
High competency using data science tools such as AWS SageMaker AI/ML and Jupyter notebooks.
Experience implementing and using one or more managed data services such as AWS Athena, AWS Glue, Amazon S3, AWS RDS, and Amazon EMR.
Python programming expertise with experience using common data analysis libraries such as Dask, Pandas, Numpy, PySpark, etc, as well as boto3 for AWS integrations.
Experience writing shell scripts (e.g., BASH) to automate processes is desirable but not required. Experience with the AWS CLI is a plus.
Experience with distributed data processing and management systems
Working knowledge of R and Posit IDE’s
Data Governance, Data Modeling, Data Mining experience is desirable.
Experience working with and implementing cloud computing services such as Amazon EC2, AWS Fargate, and AWS Lambda is desirable.
Tableau dash visualizations experience desirable.