The Azure Batch AI service has retired.

The at-scale training capabilities of Batch AI are available in Azure Machine Learning service. In addition to many other machine learning capabilities, the Azure Machine Learning service includes a cloud-based managed compute target for training and batch scoring machine learning models. The Azure Machine Learning service is a generally available service. This means that it includes a committed SLA and various support plans to choose from. Pricing for using Azure infrastructure through the Batch AI service or through the Azure Machine Learning service should not vary, only the cost of the underlying compute is charged in both cases.

Use the Azure Public cloud integration to discover and collect metrics against Azure Batch AI Workspaces.

External reference

Setup

To set up the Azure integration and discover the Azure service, go to Azure Integration Discovery Profile and select Machine Learning Services Workspaces.

Event support

  • Supported: Azure events for Azure Machine Learning Services Workspaces.
  • Configure Azure Events in OpsRamp Azure Integration Discovery Profile.

Supported metrics

OpsRamp MetricMetric Display NameUnitAggregation Type
azure_ml_services_workspaces_Active_Cores

Number of active cores.
Active CoresCountAverage
azure_ml_services_workspaces_Active_Nodes

Number of active nodes.
Active NodesCountAverage
azure_ml_services_workspaces_Cancel_Requested_Runs

Number of runs where cancel was requested for this workspace.
Cancel Requested RunsCountTotal
azure_ml_services_workspaces_Cancelled_Runs

Number of runs cancelled for this workspace.
Cancelled RunsCountTotal
azure_ml_services_workspaces_Completed_Runs

Number of runs completed successfully for this workspace.
Completed RunsCountTotal
azure_ml_services_workspaces_CpuUtilization

Percentage of memory utilization on a CPU node.
CpuUtilizationCountAverage
azure_ml_services_workspaces_Errors

Number of run errors in this workspace.
ErrorsCountTotal
azure_ml_services_workspaces_Failed_Runs

Number of runs failed for this workspace.
Failed RunsCountTotal
azure_ml_services_workspaces_Finalizing_Runs

Number of runs entered finalizing state for this workspace.
Finalizing RunsCountTotal
azure_ml_services_workspaces_GpuUtilization

Percentage of memory utilization on a GPU node.
GpuUtilizationCountAverage
azure_ml_services_workspaces_Idle_Cores

Number of idle cores.
Idle CoresCountAverage
azure_ml_services_workspaces_Idle_Nodes

Number of idle nodes.
Idle NodesCountAverage
azure_ml_services_workspaces_Leaving_Cores

Number of leaving cores.
Leaving CoresCountAverage
azure_ml_services_workspaces_Model_Deploy_Failed

Number of model deployments that failed in this workspace.
Model Deploy FailedCountTotal
azure_ml_services_workspaces_Model_Deploy_Started

Number of model deployments started in this workspace.
Model Deploy StartedCountTotal
azure_ml_services_workspaces_Model_Deploy_Succeeded

Number of model deployments that succeeded in this workspace.
Model Deploy SucceededCountTotal
azure_ml_services_workspaces_Model_Register_Failed

Number of model registrations that failed in this workspace.
Model Register FailedCountTotal
azure_ml_services_workspaces_Model_Register_Succeeded

Number of model registrations that succeeded in this workspace.
Model Register SucceededCountTotal
azure_ml_services_workspaces_Not_Responding_Runs

Number of runs not responding for this workspace.
Not Responding RunsCountTotal
azure_ml_services_workspaces_Not_Started_Runs

Number of runs in Not Started state for this workspace.
Not Started RunsCountTotal
azure_ml_services_workspaces_Preempted_Cores

Number of preempted cores.
Preempted CoresCountAverage
azure_ml_services_workspaces_Preempted_Nodes

Number of preempted nodes.
Preempted NodesCountAverage
azure_ml_services_workspaces_Preparing_Runs

Number of runs that are preparing for this workspace.
Preparing RunsCountTotal
azure_ml_services_workspaces_Provisioning_Runs

Number of runs that are provisioning for this workspace.
Provisioning RunsCountTotal
azure_ml_services_workspaces_Queued_Runs

Number of runs that are queued for this workspace.
Queued RunsCountTotal
azure_ml_services_workspaces_Quota_Utilization_Percentage

Percent of quota utilized.
Quota Utilization PercentageCountAverage
azure_ml_services_workspaces_Started_Runs

Number of runs running for this workspace.
Started RunsCountTotal
azure_ml_services_workspaces_Starting_Runs

Number of runs started for this workspace.
Starting RunsCountTotal
azure_ml_services_workspaces_Total_Cores

Number of total cores.
Total CoresCountAverage
azure_ml_services_workspaces_Total_Nodes

Number of total nodes.
Total NodesCountAverage
azure_ml_services_workspaces_Unusable_Cores

Number of unusable cores.
Unusable CoresCountAverage
azure_ml_services_workspaces_Unusable_Nodes

Number of unusable nodes.
Unusable NodesCountAverage
azure_ml_services_workspaces_Warnings

Number of run warnings in this workspace.
WarningsCountTotal