Image for Practical data science on databricks: scaling end-to-end machine learning on databricks

Practical data science on databricks: scaling end-to-end machine learning on databricks

See all formats and editions

Develop the must-have skills required for any data scientist to get the best results from Azure Databricks.

Key Features

  • Learn to develop and productionize ML pipelines using the Databricks Unified Analytics platform
  • See how to use AutoML, Feature Stores, and MLOps with Databricks
  • Get a complete understanding of data governance and model deployment

Book Description

In this book, you'll get to grips with Databricks, enabling you to power-up your organization's data science applications. We'll walk through applying the Databricks AI and ML stack to real-world use cases for natural language processing, computer vision, time series data, and more. We'll dive deep into the complete model development life cycle for data ingestion and analysis, and get familiar with the latest offerings of AutoML, Feature Store, and MLStudio, on the Databricks platform.

You'll get hands-on experience implementing repeatable ML operations (MLOps) pipeline using MLFlow, track model training and key metrics, and explore real-time ML, anomaly detection, and streaming analytics with Delta lake and Spark Structured Streaming.

Starting with an overview of Data Science use cases across different organizations and industries, you will then be introduced to feature stores, feature tables, and how to access them.

You will see why AutoML is important and how to create a baseline model with AutoML within Databricks.

Utilizing the ML Flow model registry to manage model versioning and transition to production will be covered, along with detecting and protecting against model drift in production environments.

By the end of the book, you will know how to set up your Databricks ML development and deployment as a CI/CD pipeline.

What you will learn

  • Perform natural language processing, computer vision, and more
  • Explore AutoML, Feature Store, and MLStudio on Databricks
  • Dive deep into the complete model development life cycle
  • Experience implementing repeatable MLOps pipelines using MLFlow
  • Track model training and key metrics
  • Explore real-time ML, anomaly detection, and streaming analytics
  • Learn how to handle model drift

Who This Book Is For

In this book we are going to specifically focus on the tools catering to the Data Scientist persona.

Readers who want to learn how to successfully build and deploy end-end Data Science projects using the Databricks cloud agnostic unified analytics platform will benefit from this book, along with AI and Machine Learning practitioners.

Table of Contents

  1. ML process and challenges
  2. Overview of ML on Databricks
  3. Utilizing Feature Store
  4. Understanding MLflow Components
  5. Create a Baseline Model for Bank Customer Churn Prediction Using AutoML
  6. Model Versioning and Webhooks
  7. Model Deployment Approaches
  8. Automating ML workflows using the Databricks Jobs
  9. Model Drift detection for our Churn Prediction model and retraining
  10. CI/CD to automate model retraining and re-deployment.

Read More
Special order line: only available to educational & business accounts. Sign In
£26.99
Product Details
Packt Publishing
1801818290 / 9781801818292
eBook (EPUB)
006.31
24/11/2023
United Kingdom
English
1 pages
Copy: 100%; print: 100%
Description based on CIP data; resource not viewed.