Supported Versions |
---|
Windows Server 2019 and Microsoft SQL Server 2019 (RTM) - 15.0.2000.5 (X64) for both MSSQL failover instance & availability group clusters |
Multi-node failover cluster Instance
SQL Server clustering describes a collection of two or more physical servers (nodes), connected via LAN, each of which host a SQL server instance and have the same access to shared storage. Clustering SQL servers provides high availability and protection from disasters when a server hosting the SQL Server instance fails.
Always on failover cluster instances
SQL Server Always On Failover Cluster Instances (FCIs) use Windows Server Failover Clustering (WSFC) to provide high availability at the server instance level. An FCI is a single instance of SQL Server that is installed across WSFC nodes to provide high availability for the entire installation of SQL Server.
Always On availability group
Always On availability groups feature a high availability and disaster recovery solution that provides an alternative to database mirroring, and maximizes the availability of a set of user databases for an enterprise. This supports a failover environment for a discrete set of user databases, known as availability databases.
The availability group supports a set of read-write primary databases and one to eight sets of corresponding secondary databases. Optionally, secondary databases can be made available for read-only access and/or some backup operations. Availability group fails over at the level of an availability replica. Failovers are not caused by database issues such as a database becoming suspect due to a loss of a data file, deletion of a database, or corruption of a transaction log.
Pre-requisites
The application would require classic gateway 12.0 and above.
There is no version dependency for cluster gateway.
Powershell cmdlets have the following prerequisites:
- Windows domain User should be able to do powershell remoting
Enable-PSRemoting -Force - Windows domain user should be added to “Remote Management users” group net localgroup “Remote Management Users” /add < user >
- Windows domain user should be added to “Performance monitor users” group net localgroup “Performance monitor users” /add < user >
- Windows domain User should be able to do powershell remoting
Granting Remote DCOM Rights
- To grant users DCOM rights, log on to each monitored system and complete the following procedure:
- Go to command prompt and enter dcomcnfg
- Navigate to component services > computers > My computer and then right click and select Properties. Then go to the COM Security tab.
- Under Access Permissions, go to edit limits and add the domain non-admin user and enable both local and remote access then click OK.
- In Launch and Activation permissions, go to edit limits, add the domain non-admin user and check all boxes , and click OK.
- To grant users DCOM rights, log on to each monitored system and complete the following procedure:
Granting Remote WMI Rights
- To give the user remote WMI rights, log on to each system to be monitored and complete the following procedure:
- Go to “computer management” and select “WMI Control” under ‘Services and Applications’.
- Right-click WMI control and go to properties. Then select the Security tab.
- Select root and click Security.
- Add the domain non-admin user, and check the boxes for execute methods, enable account, remote enable, and reas security.
- Click Advanced, select added non-admin domain user and click edit. Then for “Applies” to select “namespace and sub namespaces” in the dropdown.
- Click OK 3 times .
- To give the user remote WMI rights, log on to each system to be monitored and complete the following procedure:
For monitoring services like windows cluster service and other mssql services (sql server, sql agent , sql browser etc.), windows domain users have to be a part of “Local Administrators group” as we are using win32_Service class for fetching the details.
- refer to below link:
https://docs.microsoft.com/en-us/powershell/module/microsoft.powershell.management/get-wmiobject?view=powershell-5.1 - If you do not want to add the user to local administrator group, you can use “Security descriptors” for monitoring the services.
- For that we have to do the configuration as below:
Refer to 3. Enable Windows Service Monitoring section of the doc:
- refer to below link:
Windows domain user should be granted read-only access to cluster
Grant-ClusterAccess -User <domain\user> -Readonly
SQL authorization permissions:
- For monitoring some metrics, we are using JDBC. For JDBC connections we are supporting mixed authentication ( both NLTM and SQL server authentication)
- The minimum privileges for SQL auth are:
- server role: public - This server role is available by default
- securables: We need to enable some securables for connecting to SQL instances,connecting to all databases, and also some read-only permissions like below:
- CONNECT ANY DATABASE
GRANT CONNECT ANY DATABASE to < user >
- CONNECT SQL
GRANT CONNECT SQL to < user >
- SELECT ALL USER SECURABLES
GRANT SELECT ALL USER SECURABLES to < user >
- VIEW ANY DATABASE
GRANT VIEW ANY DATABASE to < user >
- VIEW ANY DEFINITION
GRANT VIEW ANY DEFINITION to < user >
- VIEW SERVER STATE
GRANT VIEW SERVER STATE to < user >
Install the integration
- From All Clients, select a client.
- Go to Setup > Integrations > Integrations.
- From Available Integrations, select Adapter >MS SQL Cluster. The Install MS SQL Cluster Integration popup appears.

- Enter the following information:
- Name: Name of the integration
- Upload Logo: (Optional) logo for the integration.
- GateWay Profiles: Select a gateway management profile to associate with the client.
- Click Install. The Integration page displays the installed integration.

Configure the integration
- In CONFIGURATION section, click +Add.
- On Create Adapter Configuration, enter:
- Name: Configuration name.
- IP Address/Host Name: IP address or host name of the target.
- Notification Alerts: Select TRUE or FALSE.
Notes:- By default FALSE is selected.
- If you select TRUE, application will handle Critical/Recovery failure alert notifications for Connectivity, Authentication Exceptions.
- MSSQL Cluster Type: Select AvailabilityGroup or FailoverClusterInstance from the dropdown.
- SQL Server Authentication Mode: Select WindowsAuthentication or SqlServerAuthentication from the dropdown.
- From the Windows Cluster Credentials section, select Custom and enter Username and Password.
- From the SQL Server Credentials section, select Custom and enter Username and Password.
- From the Resource Types & Metrics section, select the metrics and configure for availability and alert conditions, for Cluster, Server, Group, Storage, & SQL Server.
- In the Discovery Schedule section, configure how frequently the discovery action should trigger. Select Recurrence Pattern to add one of the following patterns:
- Minutes
- Hourly
- Daily
- Weekly
- Monthly
- In the Monitoring Schedule section, configure how frequently the monitoring action should trigger.
- Click Save.

After saving the configuration, the resources are discovered and monitoring is done as specified in the configuration profile.
The configuration is saved and displayed on the page.

You can perform the actions manually, like Discovery, Monitoring or Disable the configuration.
Resource information
- To view the resource information, go to Infrastructure > Resources > Cluster.
- Use the search function to find your resource and then click on it.
View resource attributes
The discovered resource(s) are displayed under clusters. In this page you will get the basic information about the resources such as: Resource Type, Native Resource Type, Resource Name, IP Address etc.

Note: The children cluster nodes are displayed under Components, click any children node and navigate to Metrics to view the individual metrics to the children node.

View resource metrics
To confirm MS SQL Cluster monitoring, review the following:
- Metric graphs: A graph is plotted for each metric that is enabled in the configuration.
- Alerts: Alerts are generated for metrics that are configured as defined for integration.

Supported metrics
Resource Type | Metric Names | Display Name | Unit |
---|---|---|---|
Cluster | windows_cluster_node_state State of all nodes of the cluster such as up or down. Possible values 0-DOWN,1-UP | Windows Cluster Node State | |
windows_cluster_online_nodes_count Count of nodes which are in Online state | Windows Cluster Online Nodes Count | count | |
windows_cluster_node_health Cluster health - percentage of online nodes | Windows Cluster Node Health | % | |
Server | windows_cluster_node_service_status "State of each node's windows os service named cluster service which is responsible for windows failover cluster. Possible values 0-STOPPED,1-RUNNING 2-Start Pending, 3-Stop Pending, 4-Continue Pending, 5-Pause Pending, 6-Paused, 7-Unknown" | Windows Cluster Node Service Status | |
Windows_cluster_node_system_os_Uptime Time lapsed since last reboot in minutes | System Uptime | m | |
Windows_cluster_node_system_cpu_Load Monitors the system's last 1min, 5min and 15min load. It sends per cpu core load average. | System CPU Load | ||
Windows_cluster_node_system_cpu_Utilization The percentage of elapsed time that the processor spends to execute a non-Idle thread(This doesn't includes CPU steal time) | System CPU Utilization | % | |
Windows_cluster_node_system_memory_Usedspace Physical and virtual memory usage in GB | System Memory Used Space | GB | |
Windows_cluster_node_system_memory_Utilization Physical and virtual memory usage in percentage. | System Memory Utilization | % | |
Windows_cluster_node_system_cpu_IdleTime Monitors cpu time in percentage spent in various program spaces. User - The processor time spent running user space processes System - The amount of time that the CPU spent running the kernel. IOWait - The time the CPU spends idle while waiting for an I/O operation to complete Idle - The time the processor spends idle Steal - The time virtual CPU has spent waiting for the hypervisor to service another virtual CPU running on a different virtual machine. Kernal Time Total Time | System CPU IdleTime | % | |
Windows_cluster_node_system_network_interface_InTraffic Monitors In traffic of each interface for windows Devices | System Network In Traffic | Kbps | |
Windows_cluster_node_system_network_interface_OutTraffic Monitors Out traffic of each interface for windows Devices | System Network Out Traffic | Kbps | |
Windows_cluster_node_system_network_interface_InPackets Monitors in Packets of each interface for windows Devices | System Network In packets | packets/sec | |
Windows_cluster_node_system_network_interface_OutPackets Monitors Out packets of each interface for windows Devices | System Network out packets | packets/sec | |
Windows_cluster_node_system_network_interface_InErrors Monitors network in errors of each interface for windows Devices | System Network In Errors | Errors per Sec | |
Windows_cluster_node_system_network_interface_OutErrors Monitors network out errors of each interface for windows Devices | System Network Out Errors | Errors per Sec | |
Windows_cluster_node_system_network_interface_InDiscords >Monitors Network in discards of each interface for windows Devices | System Network In discards | psec | |
Windows_cluster_node_system_network_interface_OutDiscords Monitors network Out Discards of each interface for windows Devices | System Network Out discards | psec | |
windows_cluster_node_sqlservices_state "Monitors sql services state. Possible values 0-STOPPED,1-RUNNING 2-Start Pending, 3-Stop Pending, 4-Continue Pending, 5-Pause Pending, 6-Paused, 7-Unknown" | SQL services state | ||
windows_cluster_node_sqlprocesses_cpuUsage Monitors sql processes cpu usage in seconds | SQL processes CPU usage | s | |
windows_cluster_node_sqlprocesses_handles Monitors sql processes handles count | SQL processes handles | count | |
Group | windows_mssql_cluster_group_running_status "State of cluster group of the failover cluster. Possible values 0-OFFLINE,1-ONLINE 2-Initializing, 3-Failed, 4-Pending, 5-OnlinePending, 6-OfflinePending, 7-Unknown" | Windows MSSQL Cluster Group Running Status | |
windows_mssql_cluster_group_failover_status Whenever the owner node which is hosting all the cluster services, goes down any other node becomes owner node automatically. This metric explains whether the current node was the owner node, when the last failover happened. Possible instance values, 0-if there is no change in OwnerNode, 1-If there is a change in OwnderNode, 2 If no OwnerNode | Windows MSSQL Cluster Group Failover Status | ||
windows_mssql_cluster_group_listener_status "Monitors mssql cluster group listeners status. Possible values 0-OFFLINE,1-ONLINE 2-Initializing, 3-Failed, 4-Pending, 5-OnlinePending, 6-OfflinePending, 7-Unknown" | Windows MSSQL cluster group listener status | ||
windows_mssql_cluster_AG_instance_status Monitors mssql availability group cluster instance status. Possible values 0-OFFLINE,1-ONLINE 2-Initializing, 3-Failed, 4-Pending, 5-OnlinePending, 6-OfflinePending, 7-Unknown. This metric is applicable only for AAG cluster. | windows MSSQL cluster availability group instance status | ||
Storage | windows_mssql_cluster_disk_AvailabilityStatus "Monitors system disk availability status. Possible values 0-OFFLINE,1-ONLINE 2-Initializing, 3-Failed, 4-Pending, 5-OnlinePending, 6-OfflinePending, 7-Unknown" | System Disk availability status | |
windows_mssql_cluster_disk_FailoverStatus Whenever the owner node which is hosting storage disk, goes down any other node becomes owner node automatically. This metric explains whether the current node was the owner node, when the last failover happened. Possible instance values, 0-if there is no change in OwnerNode, 1-If there is a change in OwnderNode, 2 If no OwnerNode | System Disk Failover status | ||
windows_mssql_cluster_disk_Usedspace Monitors disk used space in GB | System Disk UsedSpace | GB | |
windows_mssql_cluster_disk_Utilization Monitors disk utilization in percentage | System Disk Utilization | % | |
windows_mssql_cluster_disk_Freespace Monitors the Free Space usage in GB | System Disk FreeSpace | GB | |
SQL_SERVER | windows_mssql_cluster_instance_RunningStatus "Monitors MSSQL failover cluster instance running status. Possible values 0-OFFLINE,1-ONLINE 2-Initializing, 3-Failed, 4-Pending, 5-OnlinePending, 6-OfflinePending, 7-Unknown" | MSSQL Failover Cluster Instance Running Status | |
windows_mssql_cluster_instance_FailoverStatus Monitors MSSQL cluster instance failover status. Possible instance values, 0-if there is no change in OwnerNode, 1-If there is a change in OwnderNode, 2 If no OwnerNode. This metric is applicable only for FCI cluster. | MSSQL Failover Cluster Instance Failover Status | ||
windows_mssql_cluster_avgLockWaitTime The average amount of wait time (milliseconds) for each lock request that resulted in a wait. | Average Lock Wait Time | ms | |
windows_mssql_cluster_bufferCacheHitRatio Percentage of pages that were found in the buffer pool without having to incur a read from disk. | Buffer Cache Hit Ration | % | |
windows_mssql_cluster_sqlCacheMemory Total amount of dynamic memory the server is using for the dynamic SQL cache | SQL Cache Memory | KB | |
windows_mssql_cluster_datafileSize The cumulative size of all the data files in the database. | Data File Size | KB | |
windows_mssql_cluster_databasesCount Show the count of total number of databases | Databases Count | count | |
windows_mssql_cluster_fullScans Number of unrestricted full scans. These can either be base table or full index scans. | SQL Server Full Scans | psec | |
windows_mssql_cluster_databasepageReads Number of physical database page reads issued. | Databases Page Reads | rps | |
windows_mssql_cluster_databasepageWrites Number of physical database page writes issued. | Database Page Writes | wps | |
windows_mssql_cluster_serverUptime Monitors the uptime (in days) of the database server | Server Uptime | Days | |
windows_mssql_cluster_totalServerMemory Total amount of dynamic memory the server is currently consuming | Total Server Memory | KB | |
windows_mssql_cluster_databaseTransactions The number of transactions started for the database. | Database Transactions | psec | |
windows_mssql_cluster_databaseStatus 0 = ONLINE 1 = RESTORING 2 = RECOVERING SQL Server 2008 and later 3 = RECOVERY_PENDING SQL Server 2008 and later 4 = SUSPECT 5 = EMERGENCY SQL Server 2008 and later 6 = OFFLINE SQL Server 2008 and later 7 = COPYING Azure SQL Database Active Geo-Replication 10 = OFFLINE_SECONDARY Azure SQL Database Active Geo-Replication | Database Status | ||
windows_mssql_cluster_databaseActiveUsers Monitors the number of active user transactions per database | Database Active Users | count | |
windows_mssql_cluster_datafilesFreeSpace Monitors datafiles free space regardless of auto-growth | Data Files Free Space | % | |
windows_mssql_cluster_databaseFreeSpace Monitors Database Freespace in Percentage | Database Free Space | % | |
windows_mssql_cluster_dbLogCacheHitRatio Monitors MSSQL database log cache-hit ratio | DB Log Cache Hit Ratio | % | |
windows_mssql_cluster_dbLogfilesFreeSpace Monitors MSSQL database LogFiles free space | DB Logfiles Free Space | % | |
windows_mssql_cluster_minsSinceLastLogBackup Monitors Database transaction log backup in minutes | Mins since Last Log Backup | m | |
windows_mssql_cluster_minsSinceLastLogFullBackup Monitors Database Backup status in Minutes since Last Full Backup | Mins since Last Log Full Backup | m | |
windows_mssql_cluster_fileGroupFreespaceWithAvailableDisk Monitor MSSQL DB Filegroup Freespace with Available Disk | File Group Free Space with Available Disk | % | |
windows_mssql_cluster_dbFileGroupFreespace Monitors MSSQL DB Filegroup Freespace | DB File Group Free Space | % | |
windows_mssql_cluster_longRunningQueriesLast5minCount Monitors MSSQL Database Long Running queries count Last 5 minutes | Long Running Queries Last 5min Count | count | |
windows_mssql_cluster_serverIOBusy Monitors MSSQL DB Server IO Busy | Server IO Busy | % | |
windows_mssql_cluster_dbWorkspaceMemory Monitors MSSQL Database Workspace Memory in Percentage | DB Workspace Memory | % | |
windows_mssql_cluster_dbInstanceCpuUtilization Monitors MSSQL Database Instance CPU Utilization in percentage | DB Instance CPU Utilization | % | |
windows_mssql_cluster_logfileFreespaceWithAvailableDisk Monitors MSSQL DB Logfile Freespace with Available Disk in percentage | Logfile Free Space with Available Disk | % | |
windows_mssql_cluster_instanceFreeConnections Monitors MSSQL Database Instance Free Connections in percentage | Instance Free Connections | % | |
windows_mssql_cluster_daysSinceLastFullBackup MSSQL DB Backup Days Since Last Full Backup | Days since Last Full Backup | Days | |
windows_mssql_cluster_daysSinceLastLogBackup MSSQL DB Backup Days Since Last Log Backup | Days since Last Log Backup | Days | |
windows_mssql_cluster_daysSinceLastDifferentialBackup MSSQL DB Backup Days Since Last Differential Backup | Days since Last Differential Backup | Days | |
windows_mssql_cluster_alwaysOnDBAvailabilitySyncHealth MSSQL AlwaysOn DataBase Availability Synchronization Health - Below are the possible states:0 : NOT_HEALTHY 1 : PARTIALLY_HEALTHY 2 : HEALTHY. This metric is applicable only for AAG cluster. | AlwaysOn DB Avalability Sync Health | ||
windows_mssql_cluster_alwaysOnAGSyncHealth MSSQL AlwaysOn Availability Group Synchronization Health - Below are the possible values: 0 : NOT_HEALTHY 1 : PARTIALLY_HEALTHY 2 : HEALTHY. This metric is applicable only for AAG cluster. | AlwaysOn AG Sync Health | ||
windows_mssql_cluster_alwaysOnListenerState MSSQL AlwaysOn Listener State - Below are the possible values: 0 : OFFLINE 1 : ONLINE 2 : ONLINE_PENDING 3 : FAILED. This metric is applicable only for AAG cluster. | AlwaysOn Listener State | ||
windows_mssql_cluster_cpuBusy MSSQL CPU Busy | CPU Busy | microsec | |
windows_mssql_cluster_cpuIdle MSSQL CPU IDLE | CPU Idle | microsec | |
windows_mssql_cluster_cpuIOBusy MSSQL CPU IoBusy | CPU IO Busy | microsec | |
windows_mssql_cluster_alwayson_database_replicaState "It monitor the MSSQL Always On Database Replica synchronization state along with role description.Below are the possible states: PRIMARY_NOT SYNCHRONIZING - 0 PRIMARY_SYNCHRONIZING - 1 PRIMARY_SYNCHRONIZED - 2 PRIMARY_REVERTING - 3 PRIMARY_INITIALIZING - 4 SECONDARY_NOT SYNCHRONIZING - 5 SECONDARY_SYNCHRONIZING - 6 SECONDARY_SYNCHRONIZED - 7 SECONDARY_REVERTING - 8 SECONDARY_INITIALIZING - 9 This metric is applicable only for AAG cluster." | MSSQL AlwaysOn availability Group Replica State |
Supported product versions
- The application is validated on Windows Server 2019 and Microsoft SQL Server 2019 (RTM) - 15.0.2000.5 (X64) for both MSSQL failover instance & availability group clusters.
Risks, Limitations & Assumptions
- This sdk application only supports mssql-cluster which is built upon windows failover cluster.
- If an application is uninstalled or application configurations are removed, then the resources associated with that particular application/application configuration will also get deleted along with agent installed devices.
- You have to provide credentials in “SQL Server Credentials” section even though you opted for “WindowsAuthentication” as authentication mode while adding configuration.
- For Cluster object discovery and monitoring implementation, we are considering the object which has “Name” equals to Cluster Name in Get-ClusterResource response
- For ClusterGroup monitoring implementation, we are considering the object which has Name as Cluster Group in Get-ClusterGroup response All the failover status metrics possible instance values are 0 - if there is no change in OwnerNode, 1 - If there is a change in OwnerNode, 2 - If no OwnerNode
- Application can handle Critical/Recovery failure alert notifications for below two cases of when you enable Notification Alerts in configuration:
- Connectivity Exception
- Authentication Exception
- If you enable agent monitoring templates on the Cluster/Node resource, you might see the duplicate metrics with different naming conventions.
- While trying to fetch the node ip address we receive multiple node ips, which include many local ips and actual ips (example: lets assume actual node ip is 10.1.1.1 when trying to fetch the details we will receive two ips one associated with custer(192.168.0.0) and other is the actual ip). to identify the actual node ip address from the list of ip addresses received we assume that node ip address is part of the same subnet of cluster ip address, which means if cluster ip is 10.1.1.1 then node ips will be 10.1.X.X.
- We have provided the provision to give Cluster Ip Address OR HostName in configuration, but hostName provision will work only if the host name resolution works.
- Component level threshold configuration is not possible
- Resource level metric threshold customization and frequency setting are not possible
- Usability issues in application configuration page while adding/editing
- Optional configuration parameters cannot be defined
- Application upgrade is manual process without version change