Manage models and experiments in Azure Databricks

Rating & reviews (0 reviews)
Study notes

Models Lifecycle
MLflow
  • open-source product designed to manage the Machine Learning development lifecycle and important part of machine learning with Azure Databricks.
  • makes it easy for data scientists to train models and make them available without writing a great deal of code.
Components:
  • MLflow Tracking
    Provides the ability to audit the results of prior model training executions.
    For each run in an experiment, a data scientist may log parameters, versions of libraries used, evaluation metrics, and generated output files when training machine learning models. Each run contains several key attributes, including:
    • Parameters
      Key-value pairs, which represent inputs. Use parameters to track hyperparameters, that is, inputs to functions, which affect the machine learning process.
    • Metrics
      Key-value pairs, which represent how the model is performing. This can include evaluation measures such as Root Mean Square Error, and metrics can be updated throughout the course of a run. This allows a data scientist, for example, to track Root Mean Square Error for each epoch of a neural network.
    • Artifacts
      Output files. Artifacts may be stored in any format, and can include models, images, log files, data files, or anything else, which might be important for model analysis and understanding.
  • MLflow Projects
    Way of packaging up code in a manner, which allows for consistent deployment and the ability to reproduce results.
    Each project includesat least one entry point, which is a file (either .py or .sh) that is intended to act as the starting point for project use.
  • MLflow Models
    It is a dictionarycontaining an arbitrary set of files along with an MLmodel file in the root of the directory.
    Offers a standardized format for packaging models for distribution.
    Each model has a signature, which describes the expected inputs and outputs for the model.
  • MLflow Model Registry
    Allows data scientists to register models in a registry.
    The data scientist registers a model with the Model Registry, storing details such as the name of the model.
    Each registered model may have multiple versions, which allow a data scientist to keep track of model changes over time.
    It is also possible to stage models. Each model version may be in one stage, such as Staging, Production, or Archived.
MLflow Modelsand MLflow Projects combine with the MLflow Model Registry to allow operations team members to deploy models in the registry, serving them either through a REST API or as part of a batch inference solution using Azure Databricks.

Experiments
Creating an experiment in Azure Databricks happens automatically when you start a run.
with mlflow.start_run():
mlflow.log_param("input1", input1)
mlflow.log_param("input2", input2)
# Perform operations here like model training.
mlflow.log_metric("rmse", rmse)










References:
Use MLflow to track experiments in Azure Databricks - Training | Microsoft Learn
Manage machine learning models in Azure Databricks - Training | Microsoft Learn
Track Azure Databricks experiments in Azure Machine Learning - Training | Microsoft Learn