Databricks Support Engineer

This role is for a "Databricks Support Engineer" with a contract length of "unknown," offering a pay rate of "unknown." Key skills include Databricks engineering, machine learning pipelines, Python/SQL debugging, and BigQuery integration. Experience in ML Ops is required.
🌎 - Country
United States
💱 - Currency
$ USD
💰 - Day rate
Unknown
Unknown
🗓️ - Date discovered
January 14, 2025
🕒 - Project duration
Unknown
🏝️ - Location type
Unknown
📄 - Contract type
Unknown
🔒 - Security clearance
Unknown
📍 - Location detailed
Chicago, IL
🧠 - Skills detailed
#MLflow #BigQuery #Python #Data Pipeline #ML (Machine Learning) #SQL (Structured Query Language) #Spark (Apache Spark) #Monitoring #Debugging #ML Ops (Machine Learning Operations) #Data Engineering #AI (Artificial Intelligence) #Data Ingestion #SQL Queries #Scala #Databricks #Data Integration #Data Science
Role description
Log in or sign up for free to view the full role description and the link to apply.

Databricks Support engineer

Needs ML Exposure!
• Need a strong Databricks engineering skills , they have machine learning pipelines pulling data from BigQuery doing modeling and imprints and pushing it back to BigQuery
• (Mandatory) - Supporting day to day databricks jobs, troubleshooting databricks pipelines, debugging Python/SQL code.
• (basic knowledge) - Databricks admin: Cluster management, cluster policies
• (add on) - Machine learning pipelines using mlFlow in Databricks

Position:

The Senior Databricks Data Engineer is responsible for managing and optimizing machine learning operations and Databricks solutions in a dynamic and fast-paced environment. This role involves collaborating with data engineers, analysts, and data scientists to deliver robust and innovative AI solutions that enhance the Ulta guest experience.

Mandatory Skills

Machine learning Pipeline & Job Troubleshooting
• Job Debugging: Investigate failed or stuck Databricks jobs, analyze logs, and debug errors in job execution (e.g., Python exceptions, Spark-related issues) - machine learning pipelines pulling data from BigQuery doing modeling and imprints and pushing it back to BigQuery.
• SQL and Python Debugging: Assist with debugging SQL queries and Python scripts used within Databricks notebooks or jobs, ensuring that data pipelines are running smoothly and correctly.
• Pipeline Optimization: Help teams optimize their data pipelines for performance, scalability, and reliability in Databricks, identifying issues like resource overuse, memory leaks, or slow execution.

Day-to-Day Databricks Operations Support & Performance Tuning
• Regularly monitor the health and performance of Databricks jobs, clusters, and pipelines. Ensure that clusters are properly sized, running efficiently, and scaling as needed.
• Diagnose and resolve issues related to jobs, notebooks, and clusters, including performance bottlenecks, failures, and resource constraints. This often involves debugging both Python and SQL code used in Databricks environments.
• Provide support to data scientists, data engineers, and other users by helping them with their Databricks workspaces, job configurations, code issues, and cluster configurations.
• Respond to and resolve support tickets related to Databricks failures, performance degradation, connectivity issues, etc. Keep stakeholders informed about the status of incidents.
• Regularly analyze the performance of Databricks clusters and jobs, using Databricks' built-in monitoring tools (e.g., Spark UI, Ganglia, Databricks Metrics), and suggest improvements.
• Work with data teams to ensure that Spark jobs running in Databricks are optimized for speed, resource usage, and cost.
• Monitor and optimize costs by managing cluster scaling, idle time, and instance selection for both Databricks jobs and clusters.

System & Data Integrations
• Data Ingestion and Export: Support the integration of Databricks with other data systems (e.g., BigQuery, ensuring smooth data flows between Databricks and external sources/sinks.
• Job Scheduling: Configure and troubleshoot job scheduling systems, ensuring that jobs are triggered as expected, monitoring job completion, and handling failures appropriately.