Amazon EMR (previously called Amazon Elastic MapReduce) is a managed cluster platform that simplifies running big data frameworks, such as Apache Hadoop and Apache Spark, on AWS to process and analyze vast amounts of data. Using these frameworks and related open-source projects, you can process data for analytics purposes and business intelligence workloads. Amazon EMR also lets you transform and move large amounts of data into and out of other AWS data stores and databases, such as Amazon Simple Storage Service (Amazon S3) and Amazon DynamoDB. For more details, refer to the AWS documentation.
Log and metric types
You can collect the logs and metrics for Sumo Logic's Amazon EMR integration by following the below steps.
Configure metrics collection
- Collect CloudWatch Metrics with namespace
AWS/ElasticMapReduceusing the AWS Kinesis Firehose for Metrics source. For
AWS/ElasticMapReducemetrics and dimensions, refer to Amazon EMR CloudWatch metrics.
Configure logs collection
Collect Amazon EMR and Hadoop Logs using Amazon S3 source. Amazon EMR and Hadoop both produce log files that report status on the cluster. By default, these are written to the primary node in the /mnt/var/log/ directory. Depending on how you configured your cluster when you launched it, these logs may also be archived to Amazon S3. There are many types of logs written to the primary node. Amazon EMR writes step, bootstrap action, and instance state logs. Apache Hadoop writes logs to report the processing of jobs, tasks, and task attempts. Hadoop also records logs of its daemons.
Collect AWS CloudTrail Logs using AWS CloudTrail source. Amazon EMR is integrated with AWS CloudTrail, a service that provides a record of actions taken by a user, role, or an AWS service in Amazon EMR. CloudTrail captures all API calls for Amazon EMR as events. The calls captured include calls from the Amazon EMR console and code calls to the Amazon EMR API operations. If you create a trail, you can enable continuous delivery of CloudTrail events to an Amazon S3 bucket, including events for Amazon EMR. Using the information collected by CloudTrail, you can determine the request that was made to Amazon EMR, the IP address from which the request was made, who made the request, when it was made, and additional details.