The following figure represents the basic system components or building blocks, which cooperate to implement the ITOM feature set:

Documentation Information Model

The arrows indicate a generalized workflow. Resources are discovered and resource management is updated accordingly. Monitoring, using managed-resource information, scans or waits for resource fault/recovery alerts, and forwards any alert condition for alert correlation. The alert is resolved to a context-sensitive event and alert management applies the management logic to remediate the alert condition, using automation to take the appropriate response.

Integrations are provided for the platform layer and for discovery and monitoring and event management, in the solution layer, to provide interactivity with external devices and services, extending the functionality, compatibility, and scalability of the core platform.

Platform layer

The platform layer implements the core functionality on which the higher-level functions of the solution layer are built. In general, configuration and policy are set in the platform layer and govern the operation of the solution layer.

The following integrations are provided to support platform functionality:

  • Password Management
  • SSO
  • Duo Security
  • Stream exports

In addition to the elements supporting enterprise resource management,

  • resource management
  • dashboards
  • ticketing
  • reporting

the platform layer supports a multi-tenancy model for managing system accounts and users, which is a key construct for scoping the platform. Tenancy partitions the platform in a hierarchical arrangement of a partner entity with multiple client entities, each client hosting multiple users. Authorizations, permissions, and roles provide complete, user-based management functionality, shown as Users, Groups, and RBAC in the figure.

Agent and gateway components enable distributed operation in a cloud environment. Coresident with the service or network resource they monitor, they aggregate and forward data from the managed devices to the cloud. They can also be configured to run automated scripts and enter other housekeeping functionality.

Finally, the API provides a full-featured REST interface to automate management operations at scale. The following figure generalizes the API interface:

API

Solution layer

The solution layer, building on the services of the platform later, implements the functionality needed for ITOM and IAOps. It consists of hybrid discovery and monitoring, event and incident management, and remediation and automation.

Integrations are provided to support the functionality of each of these areas:

  • discovery and monitoring integrations:

    • Public cloud
    • Cloud native
    • Compute
    • Data exports
    • Network
    • Storage
  • event management integrations:

    • Collaboration
    • Configuration automation
    • Custom integration
    • Patch management
    • Third-party events
    • Ticketing and ITSM

Hybrid Discovery and Monitoring

A broad range of IT resources across data center, public cloud, and cloud native environments can be discovered and monitored with agent-based and agentless monitors. These include:

  • Data center applications, URLs, containers, servers, and network resources.
  • Public cloud environments of compute instances, databases, load balancers, and PaaS services.
  • Cloud native environments with containers and orchestrators.

Built-in monitors are provided that capture availability and performance metrics and observer optimal threshold limits for supported resources. You can extend the platform to monitor any kind of IT resource by writing custom monitor scripts.

Add-ons

  • Adapter Integrations – This add-on is used to enable Adapter category Apps, to perform discovery and monitoring of the end device.

    Refer Compute, Network, Storage sections under Integrations

  • Batch Exports - Batch export helps you incorporate platform-generated enterprise data into your data collection and analysis. You can snapshot and batch export the following types of data for each client on demand and at scheduled intervals to Amazon AWS S3 and Microsoft Azure Blob Storage:
    Click here to know more about Batch Exports.

    • Ticket data
    • Alert data
    • Metric data
    • Inventory data
    • Usage data

  • Extended Data Retention - Retain the asset data for 12 months.

  • Mask Resource Identity Management - Mask the text of captured sensitive information, including MAC addresses, IP addresses, and host names.

  • Offline Alerts - If any resource goes to an unknown state, an alert will be triggered.

  • Projects Management - This is deprecated.

  • Service Catalog Management – This is deprecated.

  • SKU Management - Verify the SKU units. This Module when enabled will allow SKU definition and their management for Resources.

  • Stream exports - Get event data to the target location without scheduling the data export. The Streaming Export feature streaming of live data to different third-party tools using AWS EventBridge using the Export Integration and Create Streaming Export. The stream data is managed with the edit and delete options and the exported data is viewable at the target locations.

    Click here to know more about Stream exports.

Event and Incident Management

Events represent business-impacting issues that require a response. Event and incident management uses escalation policies to aggregate, interpret, and act on events detected by monitors, resource diagnostics, and third-party integrations.

Using service maps, you can visualize the relationship between monitored resources and assess business and user impact based on resource health.

Event interpretation and response can be automated. Automation correlates and suppresses alerts, notifies users, and creates incident tickets for alerts that need operator intervention.

Add-ons

  • Alert problem area – Alert Problem Area enriches the alert Problem Area field with information extracted from the alert subject or description. Alert Problem Area is usually used for log-type alerts where rich information is embedded in the alert subject or description, but the metric value is the generic metric name. If the Problem Area field is not enriched, it defaults to the alert Metric field value.

    Click here to know more about Alert Problem Area.

  • Knowledge Management - Enables Knowledge management functionality. This enables users to capture product information, operational procedures, and frequently asked questions, providing a reference source for an organization.

    Click here to know more about Knowledge Base Management.

  • OS Service Start/Stop Actions - This add-on provides the ability to start and stop the OS services on agent-installed devices when given required permissions. Navigation: Infrastructure > Device Details > Services.

  • Scheduled Task Management – The Scheduled Task entity provides the ability to schedule and run recurring tasks for a predefined duration and at a specified time period. Each instance of a scheduled task is recorded and grouped as Tasks in the Scheduled Task listing.

    Click here to know more about Scheduled Task Management.

  • SLA Management – This module when enabled helps you to configure the response SLA and resolution SLA for a ticket, based on priority.

    An SLA (Service Level Agreement) is a negotiated and agreed contract between requester and assignee to resolve entities. SLA quantifies acceptable service levels and outlines when the services are delivered.

    Click here to know more about SLA Management.

  • SMS and Voice - This is a paid add-on, and when enabled will provide ability to notify users using SMS and Voice.

Remediation and Automation

Event remediation and automation can also be automated by composing workflows to handle events. This includes SMS, Voice, and Email notification. Remote SSH is also supported for alert resolution.

Add-ons

  • Application Management - This is deprecated.

  • Knowledge Management - Enables Knowledge management functionality. This enables users to capture product information, operational procedures, and frequently asked questions, providing a reference source for an organization.

    Click here to know more about Knowledge Base Management.

  • OS Service Start/Stop Actions - This add-on provides the ability to where you can start and stop the OS services on agent-installed devices when given required permissions. Navigation: Infrastructure > Device Details > Services.

  • Process Automation - This add-on provides the ability to define and execute process automation tasks.

    Click here to know more about Process Automation.

  • Remote Access Management - This is used to enable remote access (RDP, SSH, etc.) to managed devices.

    Click here to know more about Remote Access Management.

    Contact Support to change the add-on permission.