Complete guide to deploying containerized microservices to AWS ECS Fargate with production-ready configurations, covering task definitions, service discovery, auto-scaling, load balancing, and monitoring.
Welcome to Part 4 of our comprehensive series on building production-grade microservices on AWS ECS. In this installment, we’ll deploy our containerized applications to AWS ECS Fargate, configure service discovery, implement auto-scaling, and set up production-ready load balancing.
In this phase, we’ll create a complete production deployment including:
Our ECS deployment follows a modern, scalable architecture:
┌─────────────────────────────────────────────────────────────────┐
│ Internet Traffic │
└────────────────────────┬────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Route53 DNS (Optional) │
│ • Custom Domain │
│ • ACM SSL Certificate │
└────────────────────────┬────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Application Load Balancer (ALB) │
│ • HTTP/HTTPS Listeners │
│ • Health Checks │
│ • SSL Termination │
│ • Cross-Zone Load Balancing │
└────────────────────────┬────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ ECS Fargate Services │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Nginx Service │ │ Flask Service │ │ Redis Service │ │
│ │ • 2 Tasks │ │ • 2 Tasks │ │ • 1 Task │ │
│ │ • Auto-scale │ │ • Auto-scale │ │ • Service │ │
│ │ • Health │ │ • Health │ │ Connect │ │
│ │ Checks │ │ Checks │ │ │ │
│ └─────────────────┘ └─────────────────┘ └─────────────────┘ │
└────────────────────────┬────────────────────────────────────────┘
│
┌────────────┴────────────┐
▼ ▼
┌─────────────────────┐ ┌─────────────────────┐
│ RDS PostgreSQL │ │ Service Connect │
│ • Multi-AZ │ │ • Cloud Map DNS │
│ • Encrypted │ │ • Health Checks │
│ • Backups │ │ • Load Balancing │
└─────────────────────┘ └─────────────────────┘
Before we begin, ensure you have completed the previous phases:
Ensure your AWS credentials have the following permissions:
ecs:CreateService, ecs:UpdateService, ecs:RegisterTaskDefinitionelasticloadbalancing:CreateLoadBalancer, elasticloadbalancing:CreateTargetGroupapplication-autoscaling:RegisterScalableTargetservicediscovery:CreateNamespace, servicediscovery:CreateServiceLet’s deploy our production-ready ECS services step by step.
First, we’ll set up AWS Cloud Map for service-to-service communication.
Add to terraform/main.tf or create new file terraform/ecs.tf:
# ============================================
# Service Discovery Namespace
# ============================================
resource "aws_service_discovery_private_dns_namespace" "main" {
name = "${var.project_name}.local"
description = "Private DNS namespace for service discovery"
vpc = aws_vpc.main.id
tags = {
Name = "${var.project_name}-service-discovery"
}
}
# ============================================
# CloudWatch Log Groups
# ============================================
resource "aws_cloudwatch_log_group" "redis" {
name = "/ecs/${var.project_name}/redis"
retention_in_days = 7
tags = {
Name = "${var.project_name}-redis-logs"
}
}
resource "aws_cloudwatch_log_group" "flask_app" {
name = "/ecs/${var.project_name}/flask-app"
retention_in_days = 7
tags = {
Name = "${var.project_name}-flask-app-logs"
}
}
resource "aws_cloudwatch_log_group" "nginx" {
name = "/ecs/${var.project_name}/nginx"
retention_in_days = 7
tags = {
Name = "${var.project_name}-nginx-logs"
}
}
Service Discovery Benefits:
Now we’ll create production-ready ECS task definitions and services for our microservices.
We’ll deploy Redis as an ECS service for cost optimization (instead of ElastiCache):
# ============================================
# Service Discovery Namespace
# ============================================
resource "aws_service_discovery_private_dns_namespace" "main" {
name = "${var.project_name}.local"
description = "Private DNS namespace for service discovery"
vpc = aws_vpc.main.id
tags = {
Name = "${var.project_name}-service-discovery"
}
}
# ============================================
# Redis Task Definition and Service
# ============================================
resource "aws_ecs_task_definition" "redis" {
family = "${var.project_name}-redis"
network_mode = "awsvpc"
requires_compatibilities = ["FARGATE"]
cpu = "256"
memory = "512"
execution_role_arn = aws_iam_role.ecs_task_execution_role.arn
task_role_arn = aws_iam_role.ecs_task_role.arn
container_definitions = jsonencode([
{
name = "redis"
image = "redis:7-alpine"
essential = true
portMappings = [
{
containerPort = 6379
protocol = "tcp"
name = "redis"
}
]
healthCheck = {
command = ["CMD-SHELL", "redis-cli ping || exit 1"]
interval = 30
timeout = 5
retries = 3
startPeriod = 60
}
logConfiguration = {
logDriver = "awslogs"
options = {
"awslogs-group" = aws_cloudwatch_log_group.redis.name
"awslogs-region" = var.aws_region
"awslogs-stream-prefix" = "redis"
}
}
}
])
tags = {
Name = "${var.project_name}-redis-task"
}
}
resource "aws_ecs_service" "redis" {
name = "${var.project_name}-redis"
cluster = aws_ecs_cluster.main.id
task_definition = aws_ecs_task_definition.redis.arn
desired_count = 1
launch_type = "FARGATE"
network_configuration {
subnets = aws_subnet.private[*].id
security_groups = [aws_security_group.ecs_tasks.id]
assign_public_ip = false
}
service_connect_configuration {
enabled = true
namespace = aws_service_discovery_private_dns_namespace.main.arn
service {
port_name = "redis"
client_alias {
port = 6379
dns_name = "redis"
}
}
}
tags = {
Name = "${var.project_name}-redis-service"
}
}
#### 2.2: Flask Application Service
Our Flask API service with database integration and Redis caching:
resource "aws_ecs_task_definition" "flask_app" {
family = "${var.project_name}-flask-app"
network_mode = "awsvpc"
requires_compatibilities = ["FARGATE"]
cpu = "512"
memory = "1024"
execution_role_arn = aws_iam_role.ecs_task_execution_role.arn
task_role_arn = aws_iam_role.ecs_task_role.arn
container_definitions = jsonencode([
{
name = "flask-app"
image = "${aws_ecr_repository.flask_app.repository_url}:latest"
essential = true
environment = [
{
name = "DB_HOST"
value = split(":", aws_db_instance.main.endpoint)[0]
},
{
name = "DB_PORT"
value = "5432"
},
{
name = "DB_NAME"
value = aws_db_instance.main.db_name
},
{
name = "DB_USER"
value = var.db_username
},
{
name = "DB_PASSWORD"
value = var.db_password
},
{
name = "REDIS_HOST"
value = "redis"
},
{
name = "REDIS_PORT"
value = "6379"
},
{
name = "ENVIRONMENT"
value = var.environment
}
]
portMappings = [
{
containerPort = 5000
protocol = "tcp"
name = "flask-app"
}
]
healthCheck = {
command = ["CMD-SHELL", "curl -f http://localhost:5000/health || exit 1"]
interval = 30
timeout = 5
retries = 3
startPeriod = 60
}
logConfiguration = {
logDriver = "awslogs"
options = {
"awslogs-group" = aws_cloudwatch_log_group.flask_app.name
"awslogs-region" = var.aws_region
"awslogs-stream-prefix" = "flask-app"
}
}
}
])
tags = {
Name = "${var.project_name}-flask-app-task"
}
}
resource "aws_ecs_service" "flask_app" {
name = "${var.project_name}-flask-app"
cluster = aws_ecs_cluster.main.id
task_definition = aws_ecs_task_definition.flask_app.arn
desired_count = 2
launch_type = "FARGATE"
network_configuration {
subnets = aws_subnet.private[*].id
security_groups = [aws_security_group.ecs_tasks.id]
assign_public_ip = false
}
service_connect_configuration {
enabled = true
namespace = aws_service_discovery_private_dns_namespace.main.arn
service {
port_name = "flask-app"
client_alias {
port = 5000
dns_name = "flask-app"
}
}
}
depends_on = [
aws_ecs_service.redis,
aws_lb_listener.http
]
tags = {
Name = "${var.project_name}-flask-app-service"
}
}
#### 2.3: Nginx Reverse Proxy Service
Our Nginx service acts as the entry point and load balancer:
resource "aws_ecs_task_definition" "nginx" {
family = "${var.project_name}-nginx"
network_mode = "awsvpc"
requires_compatibilities = ["FARGATE"]
cpu = "256"
memory = "512"
execution_role_arn = aws_iam_role.ecs_task_execution_role.arn
task_role_arn = aws_iam_role.ecs_task_role.arn
container_definitions = jsonencode([
{
name = "nginx"
image = "${aws_ecr_repository.nginx.repository_url}:latest"
essential = true
portMappings = [
{
containerPort = 80
protocol = "tcp"
name = "nginx"
}
]
healthCheck = {
command = ["CMD-SHELL", "wget --no-verbose --tries=1 --spider http://localhost/nginx-health || exit 1"]
interval = 30
timeout = 5
retries = 3
startPeriod = 30
}
logConfiguration = {
logDriver = "awslogs"
options = {
"awslogs-group" = aws_cloudwatch_log_group.nginx.name
"awslogs-region" = var.aws_region
"awslogs-stream-prefix" = "nginx"
}
}
}
])
tags = {
Name = "${var.project_name}-nginx-task"
}
}
resource "aws_ecs_service" "nginx" {
name = "${var.project_name}-nginx"
cluster = aws_ecs_cluster.main.id
task_definition = aws_ecs_task_definition.nginx.arn
desired_count = 2
launch_type = "FARGATE"
network_configuration {
subnets = aws_subnet.private[*].id
security_groups = [aws_security_group.ecs_tasks.id]
assign_public_ip = false
}
load_balancer {
target_group_arn = aws_lb_target_group.nginx.arn
container_name = "nginx"
container_port = 80
}
service_connect_configuration {
enabled = true
namespace = aws_service_discovery_private_dns_namespace.main.arn
}
depends_on = [
aws_ecs_service.flask_app,
aws_lb_listener.http
]
tags = {
Name = "${var.project_name}-nginx-service"
}
}
#### 2.4: Auto-Scaling Configuration
We'll configure CPU-based auto-scaling for both Flask and Nginx services:
resource "aws_appautoscaling_target" "flask_app" {
max_capacity = 4
min_capacity = 2
resource_id = "service/${aws_ecs_cluster.main.name}/${aws_ecs_service.flask_app.name}"
scalable_dimension = "ecs:service:DesiredCount"
service_namespace = "ecs"
}
resource "aws_appautoscaling_policy" "flask_app_cpu" {
name = "${var.project_name}-flask-app-cpu-scaling"
policy_type = "TargetTrackingScaling"
resource_id = aws_appautoscaling_target.flask_app.resource_id
scalable_dimension = aws_appautoscaling_target.flask_app.scalable_dimension
service_namespace = aws_appautoscaling_target.flask_app.service_namespace
target_tracking_scaling_policy_configuration {
target_value = 70.0
scale_in_cooldown = 300
scale_out_cooldown = 60
predefined_metric_specification {
predefined_metric_type = "ECSServiceAverageCPUUtilization"
}
}
}
# ============================================
# Auto Scaling for Nginx
# ============================================
resource "aws_appautoscaling_target" "nginx" {
max_capacity = 4
min_capacity = 2
resource_id = "service/${aws_ecs_cluster.main.name}/${aws_ecs_service.nginx.name}"
scalable_dimension = "ecs:service:DesiredCount"
service_namespace = "ecs"
}
resource "aws_appautoscaling_policy" "nginx_cpu" {
name = "${var.project_name}-nginx-cpu-scaling"
policy_type = "TargetTrackingScaling"
resource_id = aws_appautoscaling_target.nginx.resource_id
scalable_dimension = aws_appautoscaling_target.nginx.scalable_dimension
service_namespace = aws_appautoscaling_target.nginx.service_namespace
target_tracking_scaling_policy_configuration {
target_value = 70.0
scale_in_cooldown = 300
scale_out_cooldown = 60
predefined_metric_specification {
predefined_metric_type = "ECSServiceAverageCPUUtilization"
}
}
}
We’ll set up a production-ready ALB with health checks and SSL termination capabilities.
Create terraform/alb.tf:
# ============================================
# Application Load Balancer
# ============================================
resource "aws_lb" "main" {
name = "${var.project_name}-alb"
internal = false
load_balancer_type = "application"
security_groups = [aws_security_group.alb.id]
subnets = aws_subnet.public[*].id
enable_deletion_protection = false
enable_http2 = true
enable_cross_zone_load_balancing = true
tags = {
Name = "${var.project_name}-alb"
}
}
# ============================================
# Target Groups
# ============================================
resource "aws_lb_target_group" "nginx" {
name = "${var.project_name}-nginx-tg"
port = 80
protocol = "HTTP"
vpc_id = aws_vpc.main.id
target_type = "ip"
health_check {
enabled = true
healthy_threshold = 2
unhealthy_threshold = 3
timeout = 5
interval = 30
path = "/nginx-health"
matcher = "200"
}
deregistration_delay = 30
tags = {
Name = "${var.project_name}-nginx-tg"
}
}
# ============================================
# ALB Listeners
# ============================================
resource "aws_lb_listener" "http" {
load_balancer_arn = aws_lb.main.arn
port = "80"
protocol = "HTTP"
default_action {
type = "forward"
target_group_arn = aws_lb_target_group.nginx.arn
}
tags = {
Name = "${var.project_name}-http-listener"
}
}
# Optional: HTTPS Listener (uncomment when ACM certificate is ready)
# resource "aws_lb_listener" "https" {
# load_balancer_arn = aws_lb.main.arn
# port = "443"
# protocol = "HTTPS"
# ssl_policy = "ELBSecurityPolicy-TLS-1-2-2017-01"
# certificate_arn = aws_acm_certificate.main.arn
# default_action {
# type = "forward"
# target_group_arn = aws_lb_target_group.nginx.arn
# }
# tags = {
# Name = "${var.project_name}-https-listener"
# }
# }
# Optional: Redirect HTTP to HTTPS (uncomment when HTTPS is configured)
# resource "aws_lb_listener_rule" "redirect_http_to_https" {
# listener_arn = aws_lb_listener.http.arn
# priority = 1
# action {
# type = "redirect"
# redirect {
# port = "443"
# protocol = "HTTPS"
# status_code = "HTTP_301"
# }
# }
# condition {
# path_pattern {
# values = ["/*"]
# }
# }
# }
# ============================================
# Outputs
# ============================================
output "alb_dns_name" {
description = "ALB DNS name"
value = aws_lb.main.dns_name
}
output "alb_zone_id" {
description = "ALB Zone ID for Route53"
value = aws_lb.main.zone_id
}
output "alb_arn" {
description = "ALB ARN"
value = aws_lb.main.arn
}
Create terraform/route53.tf (only if you have a domain):
# ============================================
# Variables for Route53 (add to variables.tf)
# ============================================
# Add these to variables.tf:
# variable "domain_name" {
# description = "Domain name for the application"
# type = string
# default = "" # e.g., "ecs.yourdomain.com"
# }
#
# variable "create_route53_record" {
# description = "Create Route53 DNS record"
# type = bool
# default = false
# }
# ============================================
# ACM Certificate
# ============================================
resource "aws_acm_certificate" "main" {
count = var.create_route53_record ? 1 : 0
domain_name = var.domain_name
validation_method = "DNS"
lifecycle {
create_before_destroy = true
}
tags = {
Name = "${var.project_name}-certificate"
}
}
# ============================================
# Route53 Record for ACM Validation
# ============================================
# Note: You need to have your hosted zone already created in Route53
# or managed in Cloudflare. This assumes Route53 hosted zone exists.
data "aws_route53_zone" "main" {
count = var.create_route53_record ? 1 : 0
name = var.domain_name
private_zone = false
}
resource "aws_route53_record" "cert_validation" {
count = var.create_route53_record ? 1 : 0
zone_id = data.aws_route53_zone.main[0].zone_id
name = tolist(aws_acm_certificate.main[0].domain_validation_options)[0].resource_record_name
type = tolist(aws_acm_certificate.main[0].domain_validation_options)[0].resource_record_type
records = [tolist(aws_acm_certificate.main[0].domain_validation_options)[0].resource_record_value]
ttl = 60
}
resource "aws_acm_certificate_validation" "main" {
count = var.create_route53_record ? 1 : 0
certificate_arn = aws_acm_certificate.main[0].arn
validation_record_fqdns = [aws_route53_record.cert_validation[0].fqdn]
}
# ============================================
# Route53 A Record for ALB
# ============================================
resource "aws_route53_record" "alb" {
count = var.create_route53_record ? 1 : 0
zone_id = data.aws_route53_zone.main[0].zone_id
name = var.domain_name
type = "A"
alias {
name = aws_lb.main.dns_name
zone_id = aws_lb.main.zone_id
evaluate_target_health = true
}
}
Now let’s deploy our production-ready ECS infrastructure.
Edit terraform/terraform.tfvars:
# ... existing variables ...
# Optional: Route53 DNS Configuration
create_route53_record = false # Set to true if you have a domain
domain_name = "" # e.g., "ecs.yourdomain.com"
cd terraform
# Validate configuration
terraform validate
# Review deployment plan
terraform plan
# Deploy infrastructure
terraform apply
What This Creates:
Deployment Time: Approximately 10-15 minutes
# Watch ECS services
aws ecs describe-services \
--cluster ecs-microservices-cluster \
--services ecs-microservices-nginx ecs-microservices-flask-app ecs-microservices-redis \
--region ap-south-1
# Check running tasks
aws ecs list-tasks \
--cluster ecs-microservices-cluster \
--region ap-south-1
# View service events
aws ecs describe-services \
--cluster ecs-microservices-cluster \
--services ecs-microservices-nginx \
--region ap-south-1 \
--query 'services[0].events[0:5]'
Let’s thoroughly test our production deployment to ensure everything is working correctly.
terraform output alb_dns_name
Or from AWS CLI:
aws elbv2 describe-load-balancers \
--names ecs-microservices-alb \
--region ap-south-1 \
--query 'LoadBalancers[0].DNSName' \
--output text
Wait 2-3 minutes for services to be fully healthy, then run these tests:
# Set ALB DNS
ALB_DNS="<your-alb-dns-name>"
# Test home endpoint
curl http://$ALB_DNS/
# Test health
curl http://$ALB_DNS/health
# Test stats
curl http://$ALB_DNS/api/stats
# Test Redis cache
curl http://$ALB_DNS/api/cache-test
# Create a user
curl -X POST http://$ALB_DNS/api/users \
-H "Content-Type: application/json" \
-d '{"username": "testuser"}'
# Get users
curl http://$ALB_DNS/api/users
# Get app info
curl http://$ALB_DNS/api/info
# Flask logs
aws logs tail /ecs/ecs-microservices/flask-app --follow --region ap-south-1
# Nginx logs
aws logs tail /ecs/ecs-microservices/nginx --follow --region ap-south-1
# Redis logs
aws logs tail /ecs/ecs-microservices/redis --follow --region ap-south-1
ECS Service Connect enables seamless service-to-service communication using DNS names.
# Connect to Flask container using ECS Exec
TASK_ID=$(aws ecs list-tasks \
--cluster ecs-microservices-cluster \
--service-name ecs-microservices-flask-app \
--region ap-south-1 \
--query 'taskArns[0]' \
--output text | cut -d'/' -f3)
# Exec into container
aws ecs execute-command \
--cluster ecs-microservices-cluster \
--task $TASK_ID \
--container flask-app \
--interactive \
--command "/bin/bash" \
--region ap-south-1
# Inside container, test Redis connectivity
# redis-cli -h redis ping
# Should return: PONG
If your domain is in Cloudflare but you want to use ACM for SSL:
ecs.yourdomain.com)ecs (or your subdomain)Uncomment the HTTPS listener in alb.tf and update the certificate ARN:
resource "aws_lb_listener" "https" {
load_balancer_arn = aws_lb.main.arn
port = "443"
protocol = "HTTPS"
ssl_policy = "ELBSecurityPolicy-TLS-1-2-2017-01"
certificate_arn = "arn:aws:acm:ap-south-1:ACCOUNT_ID:certificate/CERT_ID"
default_action {
type = "forward"
target_group_arn = aws_lb_target_group.nginx.arn
}
}
Then apply:
terraform apply
Check:
aws ecs describe-tasks \
--cluster ecs-microservices-cluster \
--tasks <task-arn> \
--region ap-south-1
Common causes:
Check target health:
aws elbv2 describe-target-health \
--target-group-arn <target-group-arn> \
--region ap-south-1
Common causes:
Verify:
Check:
available state| Component | Configuration | Monthly Cost |
|---|---|---|
| NAT Gateways | 2x Multi-AZ | ~$65 |
| RDS PostgreSQL | Multi-AZ, db.t3.micro | ~$35 |
| ECS Fargate | 2 Nginx + 2 Flask + 1 Redis | ~$40 |
| Application Load Balancer | Internet-facing | ~$20 |
| Data Transfer | ALB to ECS | ~$5 |
| CloudWatch Logs | 7-day retention | ~$2 |
For Development/Testing:
For Production:
✅ Part 4 Complete! Your production deployment now includes:
Proceed to Part 5: CI/CD Pipeline where we’ll automate deployments with GitHub Actions and implement blue-green deployments.
This production deployment provides a robust, scalable, and secure foundation for running microservices on AWS ECS. The infrastructure is designed for high availability, cost optimization, and operational excellence.
Ready for automation? In Part 5, we’ll implement CI/CD pipelines to automate deployments and ensure continuous delivery! Here is the Part 5, where we’ll implement CI/CD pipelines to automate deployments and ensure continuous delivery!
Questions or feedback? Feel free to reach out in the comments below!