Amazon EMR is a managed cluster platform that simplifies running big data frameworks (such as Apache Hadoop and Apache Spark) on AWS to process and analyze vast amounts of data.

By using these frameworks and related open-source projects (such as Apache Hive and Apache Pig), you can:

  • Process data for analytics purposes and business intelligence workloads.
  • Use Amazon EMR to transform and move large amounts of data into and out of other AWS data stores and databases. For example, Amazon Simple Storage Service (Amazon S3) and Amazon DynamoDB.

Use the AWS public cloud integration to discover and collect metrics against the AWS service.

External reference

What Is Amazon EMR?

Setup

To set up the AWS integration and discover the AWS service, go to AWS Integration Discovery Profile and select EMR.

Event support

CloudTrail event support

  • Supported
  • Configurable in OpsRamp AWS Integration Discovery Profile.

CloudWatch alarm support

  • Supported
  • Configurable in OpsRamp AWS Integration Discovery Profile.

Supported metrics

OpsRamp MetricAWS MetricMetric Display NameUnitAggregation Type
aws_elasticmapreduce_IsIdle

Indicates that a cluster is no longer performing work, but is still alive and accruing charges. Set to 1 if no tasks and jobs are running; set to 0 otherwise.
IsIdleIsIdleCountAverage
aws_elasticmapreduce_ContainerAllocated

Number of resource containers allocated by the ResourceManager.
ContainerAllocatedContainerAllocatedCountSum
aws_elasticmapreduce_ContainerReserved

Number of containers reserved.
ContainerReservedContainerReservedCountSum
aws_elasticmapreduce_ContainerPending

Number of containers in the queue that have not yet been allocated.
ContainerPendingContainerPendingCountSum
aws_elasticmapreduce_AppsCompleted

Number of applications submitted to YARN (Hadoop generation)) that have completed.
AppsCompletedAppsCompletedCountSum
aws_elasticmapreduce_AppsKilled

Number of killed applications submitted to YARN (Hadoop generation).
AppsKilledAppsKilledCountSum
aws_elasticmapreduce_AppsPending

Number of applications submitted to YARN (Hadoop generation) that are in a pending state.
AppsPendingAppsPendingCountSum
aws_elasticmapreduce_AppsRunning

Number of applications submitted to YARN (Hadoop generation) that are running.
AppsRunningAppsRunningCountSum
aws_elasticmapreduce_AppsSubmitted

Number of applications submitted to YARN (Hadoop generation).
AppsSubmittedAppsSubmittedCountSum
aws_elasticmapreduce_CapacityRemainingGB

Amount of remaining HDFS disk capacity.
CapacityRemainingGBCapacityRemainingGBBytesSum
aws_elasticmapreduce_CoreNodesRunning

Number of core nodes working. Data points for this metric are reported only when a corresponding instance group exists.
CoreNodesRunningCoreNodesRunningCountSum
aws_elasticmapreduce_CoreNodesPending

Number of core nodes waiting to be assigned. All of the core nodes requested may not be immediately available; this metric reports the pending requests.
CoreNodesPendingCoreNodesPendingCountSum
aws_elasticmapreduce_CorruptBlocks

Gives the big picture about what is going on with cluster and can provide insight into what is causing the slow down in processing.
CorruptBlocksCorruptBlocksCountSum
aws_elasticmapreduce_HDFSUtilization

Percentage of HDFS storage currently used.
HDFSUtilizationHDFSUtilizationPercentAverage
aws_elasticmapreduce_HDFSBytesRead

Number of bytes read from HDFS.
HDFSBytesReadHDFSBytesReadBytes ReadSum
aws_elasticmapreduce_HDFSBytesWritten

Number of bytes written to HDFS.
HDFSBytesWrittenHDFSBytesWrittenBytes WrittenSum
aws_elasticmapreduce_LiveDataNodes

Percentage of data nodes that are receiving work from Hadoop.
LiveDataNodesLiveDataNodesPercentAverage
aws_elasticmapreduce_MRTotalNodes

Number of nodes presently available to MapReduce jobs.
MRTotalNodesMRTotalNodesCountSum
aws_elasticmapreduce_MRActiveNodes

Number of nodes presently running MapReduce tasks or jobs.
MRActiveNodesMRActiveNodesCountSum
aws_elasticmapreduce_MRLostNodes

Number of nodes allocated to MapReduce marked in a LOST state.
MRLostNodesMRLostNodesCountSum
aws_elasticmapreduce_MRUnhealthyNodes

Number of nodes available to MapReduce jobs marked in an UNHEALTHY state.
MRUnhealthyNodesMRUnhealthyNodesSum
aws_elasticmapreduce_MRDecommissionedNodes

Number of nodes allocated to MapReduce applications marked in a DECOMMISSIONED state.
MRDecommissionedNodesMRDecommissionedNodesCountSum
aws_elasticmapreduce_MRRebootedNodes

Number of nodes available to MapReduce rebooted and marked in a REBOOTED state.
MRRebootedNodesMRRebootedNodesCountSum
aws_elasticmapreduce_S3BytesWritten

Number of bytes written to Amazon S3.
S3BytesWrittenS3BytesWrittenBytes WrittenSum
aws_elasticmapreduce_S3BytesRead

Number of bytes read from Amazon S3.
S3BytesReadS3BytesReadBytes ReadSum
aws_elasticmapreduce_MissingBlocks

Number of blocks in which HDFS has no replicas. These might be corrupt blocks.
MissingBlocksMissingBlocksCountSum
aws_elasticmapreduce_TotalLoad

Total number of concurrent data transfers.
TotalLoadTotalLoadCountSum
aws_elasticmapreduce_MemoryTotalMB

Total amount of memory in the cluster.
MemoryTotalMBMemoryTotalMBBytesSum
aws_elasticmapreduce_MemoryReservedMB

Amount of memory reserved.
MemoryReservedMBMemoryReservedMBBytesSum
aws_elasticmapreduce_MemoryAvailableMB

Amount of memory available to be allocated.
MemoryAvailableMBMemoryAvailableMBBytesSum
aws_elasticmapreduce_MemoryAllocatedMB

Amount of memory allocated to the cluster.
MemoryAllocatedMBMemoryAllocatedMBBytesSum
aws_elasticmapreduce_PendingDeletionBlocks

Number of blocks marked for deletion.
PendingDeletionBlocksPendingDeletionBlocksCountSum
aws_elasticmapreduce_UnderReplicatedBlocks

Number of blocks that need to be replicated one or more times.
UnderReplicatedBlocksUnderReplicatedBlocksCountSum
aws_elasticmapreduce_dfs_FSNamesystem_PendingReplicationBlocks

Status of block replication: blocks being replicated, age of replication requests, and unsuccessful replication requests.
dfsPendingReplicationBlocksdfs.FSNamesystem.PendingReplicationBlocksCountAverage
aws_elasticmapreduce_ContainerPendingRatio

Ratio of pending containers to containers allocated (ContainerPendingRatio = ContainerPending / ContainerAllocated). If ContainerAllocated = 0, ContainerPendingRatio = ContainerPending. The value of ContainerPendingRatio represents a number, not a percentage. This value is useful for scaling cluster resources based on container allocation behavior.
ContainerPendingRatioContainer Pending RatioCountSum
aws_elasticmapreduce_AppsFailed

Number of applications submitted to YARN that have failed to complete.
AppsFailedApps FailedCountSum
aws_elasticmapreduce_YARNMemoryAvailablePercentage

Percentage of remaining memory available to YARN (YARNMemoryAvailablePercentage = MemoryAvailableMB / MemoryTotalMB). This value is useful for scaling cluster resources based on YARN memory usage.
YARNMemoryAvailablePercentageYARN Memory Available PercentagePercentAverage
cloud.instance.state

n/a
Status/Staten/an/a