Dive deeper into AWS CloudWatch with advanced commands, scripting techniques, automation strategies, and essential best practices
AWS CloudWatch is a powerful monitoring and observability service offered by Amazon Web Services (AWS). It is designed to help developers, DevOps engineers, and IT teams monitor the health and performance of AWS resources, applications, and on-premises infrastructure.
At its core, CloudWatch collects and tracks metrics, monitors logs, and sets alarms to notify you when something needs attention. Think of it as a “watchdog” for your cloud environment—it keeps track of everything happening in your AWS resources and lets you know when something goes wrong or requires optimization.
AWS CloudWatch provides several benefits that make it a go-to tool for monitoring in cloud environments. Whether you’re running a single EC2 instance or managing a large-scale distributed application, CloudWatch helps maintain performance, reliability, and cost-efficiency.
Resource Optimization
Cost Savings
Proactive Issue Resolution
Seamless Integration
Let’s say you have a web application running on an EC2 instance. Using AWS CloudWatch:
When the CPU usage exceeds 80% for 5 minutes, you will get a notification. This enables you to investigate and resolve the issue quickly, ensuring a smooth user experience.
Monitoring is critical in cloud environments because:
Example: Suppose you’re hosting a gaming application. If the servers handling multiplayer matches are overloaded, players will experience lag. Monitoring ensures such situations are flagged immediately, enabling you to scale up resources or debug the issue.
Metrics are data points that represent the performance and health of your AWS resources over time. For example, the percentage of CPU being used by an EC2 instance or the number of requests handled by an API Gateway are metrics.
Metrics are organized into namespaces (like AWS/EC2 or AWS/Lambda) and include dimensions that add context (e.g., instance ID or region).
AWS services generate default metrics automatically, such as:
Metric: CPUUtilization
AWS/EC2You can define your own metrics for specific use cases, like monitoring the number of active users on your application.
Let’s say you want to monitor the number of active users:
aws cloudwatch put-metric-data \
--metric-name ActiveUsers \
--namespace MyApplication \
--value 123
ActiveUsers under the MyApplication namespace with a value of 123.CloudWatch Logs helps you collect, monitor, and analyze log data from AWS resources or your applications. Logs provide detailed insights into application behavior and issues.
Use Cases:
From EC2 Instances:
sudo apt install amazon-cloudwatch-agent
sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl \
-a fetch-config \
-m ec2 \
-c file:/path/to/config.json \
-s
config.json) specifying which logs to forward.From Lambda Functions:
START RequestId: 1234 Version: $LATEST
2023-12-18T12:00:00.000Z INFO Successfully processed request.
END RequestId: 1234
By default, logs are retained indefinitely. You can set a retention policy to delete old logs automatically:
CloudWatch Alarms monitor metrics and take action when thresholds are crossed. Actions include sending notifications or triggering an automated response.
For example, an alarm can notify you when an EC2 instance’s CPU usage exceeds 80% for 5 minutes.
CPUUtilization for an EC2 instance.CPUUtilization from AWS/EC2 namespace.aws cloudwatch put-metric-alarm \
--alarm-name HighCPUUsage \
--metric-name CPUUtilization \
--namespace AWS/EC2 \
--statistic Average \
--period 300 \
--threshold 80 \
--comparison-operator GreaterThanThreshold \
--evaluation-periods 1 \
--alarm-actions arn:aws:sns:us-east-1:123456789012:MySNSTopic
HighCPUUsage for an EC2 instance.CloudWatch Dashboards are visual displays that allow you to monitor multiple metrics from various AWS services in one place. They help you track the performance of your resources in real-time. Dashboards allow you to view key metrics without having to navigate between different sections of the CloudWatch console.
Why use Dashboards?
Steps:
AppHealthDashboard.For example, you can create a dashboard that tracks:
Command to Create a Dashboard:
aws cloudwatch put-dashboard \
--dashboard-name AppHealthDashboard \
--dashboard-body '{"widgets":[{"type":"metric","x":0,"y":0,"width":6,"height":6,"properties":{"metrics":[["AWS/EC2","CPUUtilization","InstanceId","i-1234567890abcdef0"]],"title":"EC2 CPU Usage","view":"timeSeries","stacked":false,"region":"us-east-1","period":300}}]}'
AppHealthDashboard with a widget that tracks CPU utilization for a specific EC2 instance.Example 1: Visualizing EC2 CPU Usage: You can display the CPU usage over time, which helps you see if the instance is under heavy load or performing optimally.
Example 2: Visualizing Lambda Invocation Errors: Add a metric to visualize how often your Lambda function fails, allowing you to identify issues that require attention.
You can add widgets for various AWS services by selecting the desired metric or log group when creating the dashboard. For instance, you can combine metrics from EC2, Lambda, S3, etc., into a single dashboard for comprehensive monitoring.
CloudWatch Insights allows you to query and analyze log data from CloudWatch Logs. This is particularly useful for identifying patterns or troubleshooting issues, as it gives you the ability to run SQL-like queries on log data.
With CloudWatch Insights, you can create custom queries to analyze your log data. For instance, you can search for error messages, performance issues, or specific events in your logs.
fields @timestamp, @message
| filter @message like /ERROR/
| sort @timestamp desc
| limit 20
Let’s say you want to find trends in the errors that occur in your application logs over time. The following query could help:
fields @timestamp, @message
| filter @message like /error/
| stats count() by bin(5m)
CloudWatch Insights allows you to visualize query results as graphs, making it easier to see trends or patterns. For example, you can chart the error counts over time to see if there’s a spike at any particular period.
By using these advanced features, you can move beyond basic monitoring and dive deep into the behavior of your systems, optimizing their performance and resolving issues proactively.
AWS CloudWatch provides two types of monitoring: default and detailed monitoring.
How to Enable Detailed Monitoring for EC2:
Go to EC2 Dashboard → Instances.
Select an EC2 instance.
Under Monitoring, click on Manage Detailed Monitoring.
Enable Detailed Monitoring.
Command to Enable Detailed Monitoring:
aws ec2 monitor-instances --instance-ids i-1234567890abcdef0
Each AWS service can be monitored with CloudWatch, but the configuration steps can vary depending on the resource.
EC2 Monitoring:
Lambda Monitoring:
console.log() method in your Lambda function code.You can access Lambda logs in CloudWatch by navigating to CloudWatch Console → Logs → Log Groups → /aws/lambda/your-lambda-function-name.
The CloudWatch Agent is useful for collecting custom logs and metrics from EC2 instances (such as application logs, system metrics, etc.) and sending them to CloudWatch.
Steps to Install CloudWatch Agent:
SSH into your EC2 instance.
Install the CloudWatch Agent by running the following command:
sudo apt-get install amazon-cloudwatch-agent
Configure the CloudWatch Agent using the amazon-cloudwatch-agent-config-wizard:
sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-config-wizard
Start the CloudWatch Agent:
sudo /opt/aws/amazon-cloudwatch-agent/bin/start-amazon-cloudwatch-agent
/var/log/syslog or custom application logs) are now sent to CloudWatch.If your application generates logs that you want to monitor, you can configure the CloudWatch Agent to send these custom logs to CloudWatch.
Example:
If your application writes logs to a file like /var/log/myapp.log, you can add this to the CloudWatch Agent configuration so that it sends the logs to CloudWatch.
Example configuration for custom log:
{
"logs": {
"logs_collected": {
"files": {
"collect_list": [
{
"file_path": "/var/log/myapp.log",
"log_group_name": "MyAppLogs",
"log_stream_name": "{instance_id}"
}
]
}
}
}
}
/var/log/myapp.log to CloudWatch under the log group MyAppLogs.myapp.log are now available in CloudWatch for real-time monitoring.To use AWS CloudWatch, you need the appropriate IAM permissions. These permissions are usually granted through IAM roles or policies attached to users, groups, or EC2 instances.
Example of a Basic CloudWatch Permissions Policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"cloudwatch:PutMetricData",
"cloudwatch:GetMetricData",
"cloudwatch:DescribeAlarms",
"logs:*"
],
"Resource": "*"
}
]
}
PutMetricData), retrieve metric data (GetMetricData), describe alarms, and manage logs.If you’re using the CloudWatch Agent on an EC2 instance, you need to provide an IAM role with permissions to write logs and metrics to CloudWatch.
IAM Policy for CloudWatch Agent:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"cloudwatch:PutMetricData",
"logs:PutLogEvents",
"logs:CreateLogStream",
"logs:CreateLogGroup"
],
"Resource": "*"
}
]
}
If you encounter access errors, ensure that the IAM role or policy attached to your user/EC2 instance has the correct permissions for CloudWatch. You may need to add or modify permissions to allow access to CloudWatch resources.