Building Production-Grade ECS Microservices with CI/CD - Part 4: ECS Deployment

Complete guide to deploying containerized microservices to AWS ECS Fargate with production-ready configurations, covering task definitions, service discovery, auto-scaling, load balancing, and monitoring.

AWS DevOps deployment

October 22, 2025

Building Production-Grade ECS Microservices with CI/CD - Part 4: ECS Deployment

Share This Post

Twitter LinkedIn Copy Link

Building Production-Grade ECS Microservices with CI/CD - Part 4: ECS Deployment

Welcome to Part 4 of our comprehensive series on building production-grade microservices on AWS ECS. In this installment, we’ll deploy our containerized applications to AWS ECS Fargate, configure service discovery, implement auto-scaling, and set up production-ready load balancing.

What We’ll Deploy

In this phase, we’ll create a complete production deployment including:

ECS Task Definitions - Production-ready configurations for Flask, Nginx, and Redis services
Application Load Balancer - High-performance load balancing with health checks and SSL termination
ECS Services - Fargate-based services with Service Connect for seamless communication
Auto-scaling Policies - CPU-based scaling to handle traffic spikes automatically
Service Discovery - AWS Cloud Map integration for service-to-service communication
Optional SSL/HTTPS - ACM certificates and Route53 DNS configuration

Production Deployment Architecture

Our ECS deployment follows a modern, scalable architecture:

┌─────────────────────────────────────────────────────────────────┐
│                    Internet Traffic                             │
└────────────────────────┬────────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────────┐
│              Route53 DNS (Optional)                           │
│              • Custom Domain                                   │
│              • ACM SSL Certificate                             │
└────────────────────────┬────────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────────┐
│            Application Load Balancer (ALB)                     │
│            • HTTP/HTTPS Listeners                              │
│            • Health Checks                                     │
│            • SSL Termination                                   │
│            • Cross-Zone Load Balancing                         │
└────────────────────────┬────────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────────┐
│                    ECS Fargate Services                        │
│  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐ │
│  │   Nginx Service │  │  Flask Service  │  │  Redis Service  │ │
│  │   • 2 Tasks     │  │  • 2 Tasks      │  │  • 1 Task       │ │
│  │   • Auto-scale  │  │  • Auto-scale   │  │  • Service      │ │
│  │   • Health      │  │  • Health       │  │    Connect      │ │
│  │     Checks      │  │    Checks       │  │                │ │
│  └─────────────────┘  └─────────────────┘  └─────────────────┘ │
└────────────────────────┬────────────────────────────────────────┘
                         │
            ┌────────────┴────────────┐
            ▼                        ▼
┌─────────────────────┐    ┌─────────────────────┐
│   RDS PostgreSQL    │    │   Service Connect    │
│   • Multi-AZ        │    │   • Cloud Map DNS    │
│   • Encrypted       │    │   • Health Checks    │
│   • Backups         │    │   • Load Balancing   │
└─────────────────────┘    └─────────────────────┘

Prerequisites

Before we begin, ensure you have completed the previous phases:

Infrastructure Requirements

Part 1 Complete: VPC, subnets, security groups, RDS database, ECR repositories
Part 2 Complete: ECS cluster, IAM roles, CloudWatch log groups
Part 3 Complete: Containerized applications pushed to ECR

AWS Permissions

Ensure your AWS credentials have the following permissions:

ECS: ecs:CreateService, ecs:UpdateService, ecs:RegisterTaskDefinition
ELB: elasticloadbalancing:CreateLoadBalancer, elasticloadbalancing:CreateTargetGroup
Auto Scaling: application-autoscaling:RegisterScalableTarget
Service Discovery: servicediscovery:CreateNamespace, servicediscovery:CreateService

Required Information

ECR repository URLs for Flask and Nginx images
RDS endpoint and credentials
VPC and subnet IDs
Security group IDs

Step-by-Step Implementation

Let’s deploy our production-ready ECS services step by step.

Step 1: Configure Service Discovery

First, we’ll set up AWS Cloud Map for service-to-service communication.

1.1: Create Service Discovery Namespace

Add to terraform/main.tf or create new file terraform/ecs.tf:

# ============================================
# Service Discovery Namespace
# ============================================

resource "aws_service_discovery_private_dns_namespace" "main" {
  name        = "${var.project_name}.local"
  description = "Private DNS namespace for service discovery"
  vpc         = aws_vpc.main.id

  tags = {
    Name = "${var.project_name}-service-discovery"
  }
}

# ============================================
# CloudWatch Log Groups
# ============================================

resource "aws_cloudwatch_log_group" "redis" {
  name              = "/ecs/${var.project_name}/redis"
  retention_in_days = 7

  tags = {
    Name = "${var.project_name}-redis-logs"
  }
}

resource "aws_cloudwatch_log_group" "flask_app" {
  name              = "/ecs/${var.project_name}/flask-app"
  retention_in_days = 7

  tags = {
    Name = "${var.project_name}-flask-app-logs"
  }
}

resource "aws_cloudwatch_log_group" "nginx" {
  name              = "/ecs/${var.project_name}/nginx"
  retention_in_days = 7

  tags = {
    Name = "${var.project_name}-nginx-logs"
  }
}

Service Discovery Benefits:

DNS-based service discovery for microservices communication
Health checks for automatic service registration
Load balancing across service instances
Service mesh capabilities with ECS Service Connect

Step 2: Deploy ECS Services

Now we’ll create production-ready ECS task definitions and services for our microservices.

2.1: Redis Service Configuration

We’ll deploy Redis as an ECS service for cost optimization (instead of ElastiCache):

# ============================================
# Service Discovery Namespace
# ============================================

resource "aws_service_discovery_private_dns_namespace" "main" {
  name        = "${var.project_name}.local"
  description = "Private DNS namespace for service discovery"
  vpc         = aws_vpc.main.id

  tags = {
    Name = "${var.project_name}-service-discovery"
  }
}

# ============================================
# Redis Task Definition and Service
# ============================================

resource "aws_ecs_task_definition" "redis" {
  family                   = "${var.project_name}-redis"
  network_mode             = "awsvpc"
  requires_compatibilities = ["FARGATE"]
  cpu                      = "256"
  memory                   = "512"
  execution_role_arn       = aws_iam_role.ecs_task_execution_role.arn
  task_role_arn            = aws_iam_role.ecs_task_role.arn

  container_definitions = jsonencode([
    {
      name      = "redis"
      image     = "redis:7-alpine"
      essential = true

      portMappings = [
        {
          containerPort = 6379
          protocol      = "tcp"
          name          = "redis"
        }
      ]

      healthCheck = {
        command     = ["CMD-SHELL", "redis-cli ping || exit 1"]
        interval    = 30
        timeout     = 5
        retries     = 3
        startPeriod = 60
      }

      logConfiguration = {
        logDriver = "awslogs"
        options = {
          "awslogs-group"         = aws_cloudwatch_log_group.redis.name
          "awslogs-region"        = var.aws_region
          "awslogs-stream-prefix" = "redis"
        }
      }
    }
  ])

  tags = {
    Name = "${var.project_name}-redis-task"
  }
}

resource "aws_ecs_service" "redis" {
  name            = "${var.project_name}-redis"
  cluster         = aws_ecs_cluster.main.id
  task_definition = aws_ecs_task_definition.redis.arn
  desired_count   = 1
  launch_type     = "FARGATE"

  network_configuration {
    subnets          = aws_subnet.private[*].id
    security_groups  = [aws_security_group.ecs_tasks.id]
    assign_public_ip = false
  }

  service_connect_configuration {
    enabled   = true
    namespace = aws_service_discovery_private_dns_namespace.main.arn

    service {
      port_name = "redis"
      client_alias {
        port     = 6379
        dns_name = "redis"
      }
    }
  }

  tags = {
    Name = "${var.project_name}-redis-service"
  }
}

#### 2.2: Flask Application Service

Our Flask API service with database integration and Redis caching:

resource "aws_ecs_task_definition" "flask_app" {
  family                   = "${var.project_name}-flask-app"
  network_mode             = "awsvpc"
  requires_compatibilities = ["FARGATE"]
  cpu                      = "512"
  memory                   = "1024"
  execution_role_arn       = aws_iam_role.ecs_task_execution_role.arn
  task_role_arn            = aws_iam_role.ecs_task_role.arn

  container_definitions = jsonencode([
    {
      name      = "flask-app"
      image     = "${aws_ecr_repository.flask_app.repository_url}:latest"
      essential = true

      environment = [
        {
          name  = "DB_HOST"
          value = split(":", aws_db_instance.main.endpoint)[0]
        },
        {
          name  = "DB_PORT"
          value = "5432"
        },
        {
          name  = "DB_NAME"
          value = aws_db_instance.main.db_name
        },
        {
          name  = "DB_USER"
          value = var.db_username
        },
        {
          name  = "DB_PASSWORD"
          value = var.db_password
        },
        {
          name  = "REDIS_HOST"
          value = "redis"
        },
        {
          name  = "REDIS_PORT"
          value = "6379"
        },
        {
          name  = "ENVIRONMENT"
          value = var.environment
        }
      ]

      portMappings = [
        {
          containerPort = 5000
          protocol      = "tcp"
          name          = "flask-app"
        }
      ]

      healthCheck = {
        command     = ["CMD-SHELL", "curl -f http://localhost:5000/health || exit 1"]
        interval    = 30
        timeout     = 5
        retries     = 3
        startPeriod = 60
      }

      logConfiguration = {
        logDriver = "awslogs"
        options = {
          "awslogs-group"         = aws_cloudwatch_log_group.flask_app.name
          "awslogs-region"        = var.aws_region
          "awslogs-stream-prefix" = "flask-app"
        }
      }
    }
  ])

  tags = {
    Name = "${var.project_name}-flask-app-task"
  }
}

resource "aws_ecs_service" "flask_app" {
  name            = "${var.project_name}-flask-app"
  cluster         = aws_ecs_cluster.main.id
  task_definition = aws_ecs_task_definition.flask_app.arn
  desired_count   = 2
  launch_type     = "FARGATE"

  network_configuration {
    subnets          = aws_subnet.private[*].id
    security_groups  = [aws_security_group.ecs_tasks.id]
    assign_public_ip = false
  }

  service_connect_configuration {
    enabled   = true
    namespace = aws_service_discovery_private_dns_namespace.main.arn

    service {
      port_name = "flask-app"
      client_alias {
        port     = 5000
        dns_name = "flask-app"
      }
    }
  }

  depends_on = [
    aws_ecs_service.redis,
    aws_lb_listener.http
  ]

  tags = {
    Name = "${var.project_name}-flask-app-service"
  }
}

#### 2.3: Nginx Reverse Proxy Service

Our Nginx service acts as the entry point and load balancer:

resource "aws_ecs_task_definition" "nginx" {
  family                   = "${var.project_name}-nginx"
  network_mode             = "awsvpc"
  requires_compatibilities = ["FARGATE"]
  cpu                      = "256"
  memory                   = "512"
  execution_role_arn       = aws_iam_role.ecs_task_execution_role.arn
  task_role_arn            = aws_iam_role.ecs_task_role.arn

  container_definitions = jsonencode([
    {
      name      = "nginx"
      image     = "${aws_ecr_repository.nginx.repository_url}:latest"
      essential = true

      portMappings = [
        {
          containerPort = 80
          protocol      = "tcp"
          name          = "nginx"
        }
      ]

      healthCheck = {
        command     = ["CMD-SHELL", "wget --no-verbose --tries=1 --spider http://localhost/nginx-health || exit 1"]
        interval    = 30
        timeout     = 5
        retries     = 3
        startPeriod = 30
      }

      logConfiguration = {
        logDriver = "awslogs"
        options = {
          "awslogs-group"         = aws_cloudwatch_log_group.nginx.name
          "awslogs-region"        = var.aws_region
          "awslogs-stream-prefix" = "nginx"
        }
      }
    }
  ])

  tags = {
    Name = "${var.project_name}-nginx-task"
  }
}

resource "aws_ecs_service" "nginx" {
  name            = "${var.project_name}-nginx"
  cluster         = aws_ecs_cluster.main.id
  task_definition = aws_ecs_task_definition.nginx.arn
  desired_count   = 2
  launch_type     = "FARGATE"

  network_configuration {
    subnets          = aws_subnet.private[*].id
    security_groups  = [aws_security_group.ecs_tasks.id]
    assign_public_ip = false
  }

  load_balancer {
    target_group_arn = aws_lb_target_group.nginx.arn
    container_name   = "nginx"
    container_port   = 80
  }

  service_connect_configuration {
    enabled   = true
    namespace = aws_service_discovery_private_dns_namespace.main.arn
  }

  depends_on = [
    aws_ecs_service.flask_app,
    aws_lb_listener.http
  ]

  tags = {
    Name = "${var.project_name}-nginx-service"
  }
}

#### 2.4: Auto-Scaling Configuration

We'll configure CPU-based auto-scaling for both Flask and Nginx services:

resource "aws_appautoscaling_target" "flask_app" {
  max_capacity       = 4
  min_capacity       = 2
  resource_id        = "service/${aws_ecs_cluster.main.name}/${aws_ecs_service.flask_app.name}"
  scalable_dimension = "ecs:service:DesiredCount"
  service_namespace  = "ecs"
}

resource "aws_appautoscaling_policy" "flask_app_cpu" {
  name               = "${var.project_name}-flask-app-cpu-scaling"
  policy_type        = "TargetTrackingScaling"
  resource_id        = aws_appautoscaling_target.flask_app.resource_id
  scalable_dimension = aws_appautoscaling_target.flask_app.scalable_dimension
  service_namespace  = aws_appautoscaling_target.flask_app.service_namespace

  target_tracking_scaling_policy_configuration {
    target_value       = 70.0
    scale_in_cooldown  = 300
    scale_out_cooldown = 60

    predefined_metric_specification {
      predefined_metric_type = "ECSServiceAverageCPUUtilization"
    }
  }
}

# ============================================
# Auto Scaling for Nginx
# ============================================

resource "aws_appautoscaling_target" "nginx" {
  max_capacity       = 4
  min_capacity       = 2
  resource_id        = "service/${aws_ecs_cluster.main.name}/${aws_ecs_service.nginx.name}"
  scalable_dimension = "ecs:service:DesiredCount"
  service_namespace  = "ecs"
}

resource "aws_appautoscaling_policy" "nginx_cpu" {
  name               = "${var.project_name}-nginx-cpu-scaling"
  policy_type        = "TargetTrackingScaling"
  resource_id        = aws_appautoscaling_target.nginx.resource_id
  scalable_dimension = aws_appautoscaling_target.nginx.scalable_dimension
  service_namespace  = aws_appautoscaling_target.nginx.service_namespace

  target_tracking_scaling_policy_configuration {
    target_value       = 70.0
    scale_in_cooldown  = 300
    scale_out_cooldown = 60

    predefined_metric_specification {
      predefined_metric_type = "ECSServiceAverageCPUUtilization"
    }
  }
}

Step 3: Configure Application Load Balancer

We’ll set up a production-ready ALB with health checks and SSL termination capabilities.

3.1: Create ALB Configuration

Create terraform/alb.tf:

# ============================================
# Application Load Balancer
# ============================================

resource "aws_lb" "main" {
  name               = "${var.project_name}-alb"
  internal           = false
  load_balancer_type = "application"
  security_groups    = [aws_security_group.alb.id]
  subnets            = aws_subnet.public[*].id

  enable_deletion_protection = false
  enable_http2              = true
  enable_cross_zone_load_balancing = true

  tags = {
    Name = "${var.project_name}-alb"
  }
}

# ============================================
# Target Groups
# ============================================

resource "aws_lb_target_group" "nginx" {
  name        = "${var.project_name}-nginx-tg"
  port        = 80
  protocol    = "HTTP"
  vpc_id      = aws_vpc.main.id
  target_type = "ip"

  health_check {
    enabled             = true
    healthy_threshold   = 2
    unhealthy_threshold = 3
    timeout             = 5
    interval            = 30
    path                = "/nginx-health"
    matcher             = "200"
  }

  deregistration_delay = 30

  tags = {
    Name = "${var.project_name}-nginx-tg"
  }
}

# ============================================
# ALB Listeners
# ============================================

resource "aws_lb_listener" "http" {
  load_balancer_arn = aws_lb.main.arn
  port              = "80"
  protocol          = "HTTP"

  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.nginx.arn
  }

  tags = {
    Name = "${var.project_name}-http-listener"
  }
}

# Optional: HTTPS Listener (uncomment when ACM certificate is ready)
# resource "aws_lb_listener" "https" {
#   load_balancer_arn = aws_lb.main.arn
#   port              = "443"
#   protocol          = "HTTPS"
#   ssl_policy        = "ELBSecurityPolicy-TLS-1-2-2017-01"
#   certificate_arn   = aws_acm_certificate.main.arn

#   default_action {
#     type             = "forward"
#     target_group_arn = aws_lb_target_group.nginx.arn
#   }

#   tags = {
#     Name = "${var.project_name}-https-listener"
#   }
# }

# Optional: Redirect HTTP to HTTPS (uncomment when HTTPS is configured)
# resource "aws_lb_listener_rule" "redirect_http_to_https" {
#   listener_arn = aws_lb_listener.http.arn
#   priority     = 1

#   action {
#     type = "redirect"
#     redirect {
#       port        = "443"
#       protocol    = "HTTPS"
#       status_code = "HTTP_301"
#     }
#   }

#   condition {
#     path_pattern {
#       values = ["/*"]
#     }
#   }
# }

# ============================================
# Outputs
# ============================================

output "alb_dns_name" {
  description = "ALB DNS name"
  value       = aws_lb.main.dns_name
}

output "alb_zone_id" {
  description = "ALB Zone ID for Route53"
  value       = aws_lb.main.zone_id
}

output "alb_arn" {
  description = "ALB ARN"
  value       = aws_lb.main.arn
}

Step 4: Optional - Route53 and ACM SSL

Create terraform/route53.tf (only if you have a domain):

# ============================================
# Variables for Route53 (add to variables.tf)
# ============================================

# Add these to variables.tf:
# variable "domain_name" {
#   description = "Domain name for the application"
#   type        = string
#   default     = ""  # e.g., "ecs.yourdomain.com"
# }
#
# variable "create_route53_record" {
#   description = "Create Route53 DNS record"
#   type        = bool
#   default     = false
# }

# ============================================
# ACM Certificate
# ============================================

resource "aws_acm_certificate" "main" {
  count             = var.create_route53_record ? 1 : 0
  domain_name       = var.domain_name
  validation_method = "DNS"

  lifecycle {
    create_before_destroy = true
  }

  tags = {
    Name = "${var.project_name}-certificate"
  }
}

# ============================================
# Route53 Record for ACM Validation
# ============================================

# Note: You need to have your hosted zone already created in Route53
# or managed in Cloudflare. This assumes Route53 hosted zone exists.

data "aws_route53_zone" "main" {
  count        = var.create_route53_record ? 1 : 0
  name         = var.domain_name
  private_zone = false
}

resource "aws_route53_record" "cert_validation" {
  count   = var.create_route53_record ? 1 : 0
  zone_id = data.aws_route53_zone.main[0].zone_id
  name    = tolist(aws_acm_certificate.main[0].domain_validation_options)[0].resource_record_name
  type    = tolist(aws_acm_certificate.main[0].domain_validation_options)[0].resource_record_type
  records = [tolist(aws_acm_certificate.main[0].domain_validation_options)[0].resource_record_value]
  ttl     = 60
}

resource "aws_acm_certificate_validation" "main" {
  count                   = var.create_route53_record ? 1 : 0
  certificate_arn         = aws_acm_certificate.main[0].arn
  validation_record_fqdns = [aws_route53_record.cert_validation[0].fqdn]
}

# ============================================
# Route53 A Record for ALB
# ============================================

resource "aws_route53_record" "alb" {
  count   = var.create_route53_record ? 1 : 0
  zone_id = data.aws_route53_zone.main[0].zone_id
  name    = var.domain_name
  type    = "A"

  alias {
    name                   = aws_lb.main.dns_name
    zone_id                = aws_lb.main.zone_id
    evaluate_target_health = true
  }
}

Step 5: Deploy the Complete Infrastructure

Now let’s deploy our production-ready ECS infrastructure.

5.1: Configure Optional Route53 Variables

Edit terraform/terraform.tfvars:

# ... existing variables ...

# Optional: Route53 DNS Configuration
create_route53_record = false  # Set to true if you have a domain
domain_name          = ""      # e.g., "ecs.yourdomain.com"

5.2: Deploy Infrastructure

cd terraform

# Validate configuration
terraform validate

# Review deployment plan
terraform plan

# Deploy infrastructure
terraform apply

What This Creates:

Service Discovery: AWS Cloud Map namespace for service communication
ECS Task Definitions: Production-ready configurations for all services
ECS Services: Fargate-based services with Service Connect
Application Load Balancer: High-performance load balancer with health checks
Target Groups: Health check configuration for Nginx service
Auto-scaling: CPU-based scaling policies for Flask and Nginx
Optional: ACM certificate and Route53 DNS records

Deployment Time: Approximately 10-15 minutes

ALB provisioning: ~5 minutes
ECS services startup: ~10 minutes
Health checks and service registration: ~5 minutes

5.3: Monitor Deployment

# Watch ECS services
aws ecs describe-services \
  --cluster ecs-microservices-cluster \
  --services ecs-microservices-nginx ecs-microservices-flask-app ecs-microservices-redis \
  --region ap-south-1

# Check running tasks
aws ecs list-tasks \
  --cluster ecs-microservices-cluster \
  --region ap-south-1

# View service events
aws ecs describe-services \
  --cluster ecs-microservices-cluster \
  --services ecs-microservices-nginx \
  --region ap-south-1 \
  --query 'services[0].events[0:5]'

Step 6: Validate Production Deployment

Let’s thoroughly test our production deployment to ensure everything is working correctly.

6.1: Get Application Load Balancer DNS Name

terraform output alb_dns_name

Or from AWS CLI:

aws elbv2 describe-load-balancers \
  --names ecs-microservices-alb \
  --region ap-south-1 \
  --query 'LoadBalancers[0].DNSName' \
  --output text

6.2: Comprehensive API Testing

Wait 2-3 minutes for services to be fully healthy, then run these tests:

# Set ALB DNS
ALB_DNS="<your-alb-dns-name>"

# Test home endpoint
curl http://$ALB_DNS/

# Test health
curl http://$ALB_DNS/health

# Test stats
curl http://$ALB_DNS/api/stats

# Test Redis cache
curl http://$ALB_DNS/api/cache-test

# Create a user
curl -X POST http://$ALB_DNS/api/users \
  -H "Content-Type: application/json" \
  -d '{"username": "testuser"}'

# Get users
curl http://$ALB_DNS/api/users

# Get app info
curl http://$ALB_DNS/api/info

6.3: Monitor Application Logs

# Flask logs
aws logs tail /ecs/ecs-microservices/flask-app --follow --region ap-south-1

# Nginx logs
aws logs tail /ecs/ecs-microservices/nginx --follow --region ap-south-1

# Redis logs
aws logs tail /ecs/ecs-microservices/redis --follow --region ap-south-1

Step 7: Verify Service Connect Communication

ECS Service Connect enables seamless service-to-service communication using DNS names.

# Connect to Flask container using ECS Exec
TASK_ID=$(aws ecs list-tasks \
  --cluster ecs-microservices-cluster \
  --service-name ecs-microservices-flask-app \
  --region ap-south-1 \
  --query 'taskArns[0]' \
  --output text | cut -d'/' -f3)

# Exec into container
aws ecs execute-command \
  --cluster ecs-microservices-cluster \
  --task $TASK_ID \
  --container flask-app \
  --interactive \
  --command "/bin/bash" \
  --region ap-south-1

# Inside container, test Redis connectivity
# redis-cli -h redis ping
# Should return: PONG

Step 8: Optional - Configure Custom Domain with Cloudflare

If your domain is in Cloudflare but you want to use ACM for SSL:

8.1: Create ACM Certificate Manually

Go to AWS Certificate Manager in ap-south-1
Request a public certificate for your subdomain (e.g., ecs.yourdomain.com)
Choose DNS validation
Copy the CNAME record details

8.2: Add Validation CNAME to Cloudflare

Log in to Cloudflare
Go to your domain’s DNS settings
Add a CNAME record with ACM validation details
Wait for certificate to be issued (~5 minutes)

8.3: Create A Record Pointing to ALB

In Cloudflare, create an A record:
- Name: ecs (or your subdomain)
- Content: Your ALB DNS name
- Proxy status: DNS only (gray cloud) - important for ALB
- TTL: Auto

8.4: Update ALB to Use HTTPS

Uncomment the HTTPS listener in alb.tf and update the certificate ARN:

resource "aws_lb_listener" "https" {
  load_balancer_arn = aws_lb.main.arn
  port              = "443"
  protocol          = "HTTPS"
  ssl_policy        = "ELBSecurityPolicy-TLS-1-2-2017-01"
  certificate_arn   = "arn:aws:acm:ap-south-1:ACCOUNT_ID:certificate/CERT_ID"

  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.nginx.arn
  }
}

Then apply:

terraform apply

Production Troubleshooting

Issue: ECS Tasks Fail to Start

Check:

aws ecs describe-tasks \
  --cluster ecs-microservices-cluster \
  --tasks <task-arn> \
  --region ap-south-1

Common causes:

ECR image not found - push images again
IAM role missing permissions
Insufficient memory/CPU

Issue: ALB health checks failing

Check target health:

aws elbv2 describe-target-health \
  --target-group-arn <target-group-arn> \
  --region ap-south-1

Common causes:

Security group not allowing ALB → ECS traffic
Health check path incorrect
Container not listening on correct port

Issue: Service Connect not working

Verify:

All services have Service Connect enabled
Services are in the same namespace
DNS names are correct (redis, flask-app)

Issue: Can’t connect to database

Check:

RDS security group allows traffic from ECS security group
Database credentials are correct in task definition
RDS instance is in available state

Production Cost Optimization

Current Monthly Costs (~$150/month)

Component	Configuration	Monthly Cost
NAT Gateways	2x Multi-AZ	~$65
RDS PostgreSQL	Multi-AZ, db.t3.micro	~$35
ECS Fargate	2 Nginx + 2 Flask + 1 Redis	~$40
Application Load Balancer	Internet-facing	~$20
Data Transfer	ALB to ECS	~$5
CloudWatch Logs	7-day retention	~$2

Cost Reduction Strategies

For Development/Testing:

Single NAT Gateway: Use 1 NAT Gateway instead of 2 (saves ~$32/month)
RDS Single-AZ: Use single-AZ for dev (saves ~$17/month)
Reduced Task Count: 1 of each service (saves ~$20/month)
Scheduled Shutdown: Stop services when not in use

For Production:

Reserved Instances: Use RDS Reserved Instances for 1-year commitment
Spot Instances: Use Fargate Spot for non-critical workloads
Right-sizing: Monitor and adjust CPU/memory based on usage
Lifecycle Policies: Implement ECR image lifecycle policies

Production Best Practices

High Availability Features

Multi-AZ Deployment: Services distributed across 2 availability zones
Auto-scaling: CPU-based scaling handles traffic spikes automatically
Health Checks: ALB and container-level health monitoring
Service Discovery: AWS Cloud Map for reliable service communication
Load Balancing: ALB distributes traffic across healthy instances

Security Considerations

Network Isolation: Private subnets for ECS tasks
Security Groups: Least privilege access controls
SSL/TLS: HTTPS termination at ALB
IAM Roles: Task execution and task roles with minimal permissions
Encryption: Data encrypted at rest and in transit

Monitoring and Observability

CloudWatch Logs: Centralized logging for all services
Health Checks: Multi-level health monitoring
Auto-scaling: Automatic response to traffic changes
Service Connect: Built-in service mesh capabilities
Cost Monitoring: Detailed cost breakdown and optimization

Next Steps

✅ Part 4 Complete! Your production deployment now includes:

ECS Fargate Services running across multiple AZs
Application Load Balancer with health checks and SSL termination
Auto-scaling policies for handling traffic spikes
Service Connect for seamless microservices communication
Production monitoring with CloudWatch logs and metrics
Cost optimization strategies for different environments

Proceed to Part 5: CI/CD Pipeline where we’ll automate deployments with GitHub Actions and implement blue-green deployments.

Key Takeaways

Architecture Benefits

Serverless Architecture with ECS Fargate eliminates server management
Service Mesh capabilities through ECS Service Connect
Auto-scaling ensures optimal resource utilization
High Availability across multiple availability zones
Cost Optimization with detailed cost breakdown and strategies

Production Readiness

Health Monitoring at multiple levels ensures reliability
Security Best Practices with network isolation and IAM roles
Scalability through auto-scaling and load balancing
Observability with comprehensive logging and monitoring
Cost Management with optimization strategies for different environments

Operational Excellence

Service Discovery simplifies microservices communication
Task Definitions are versioned for reliable deployments
Load Balancing distributes traffic efficiently
Auto-scaling handles traffic spikes automatically
CloudWatch Integration provides comprehensive monitoring

This production deployment provides a robust, scalable, and secure foundation for running microservices on AWS ECS. The infrastructure is designed for high availability, cost optimization, and operational excellence.

Ready for automation? In Part 5, we’ll implement CI/CD pipelines to automate deployments and ensure continuous delivery! Here is the Part 5, where we’ll implement CI/CD pipelines to automate deployments and ensure continuous delivery!

Questions or feedback? Feel free to reach out in the comments below!

Share This Post

Twitter LinkedIn Copy Link

Building Production-Grade ECS Microservices with CI/CD - Part 4: ECS Deployment

Table of Contents

Share This Post

Building Production-Grade ECS Microservices with CI/CD - Part 4: ECS Deployment

What We’ll Deploy

Production Deployment Architecture

Prerequisites

Infrastructure Requirements

AWS Permissions

Required Information

Step-by-Step Implementation

Step 1: Configure Service Discovery

1.1: Create Service Discovery Namespace

Step 2: Deploy ECS Services

2.1: Redis Service Configuration

Step 3: Configure Application Load Balancer

3.1: Create ALB Configuration

Step 4: Optional - Route53 and ACM SSL

Step 5: Deploy the Complete Infrastructure

5.1: Configure Optional Route53 Variables

5.2: Deploy Infrastructure

5.3: Monitor Deployment

Step 6: Validate Production Deployment

6.1: Get Application Load Balancer DNS Name

6.2: Comprehensive API Testing

6.3: Monitor Application Logs

Step 7: Verify Service Connect Communication

Step 8: Optional - Configure Custom Domain with Cloudflare

8.1: Create ACM Certificate Manually

8.2: Add Validation CNAME to Cloudflare

8.3: Create A Record Pointing to ALB

8.4: Update ALB to Use HTTPS

Production Troubleshooting

Issue: ECS Tasks Fail to Start

Issue: ALB health checks failing

Issue: Service Connect not working

Issue: Can’t connect to database

Production Cost Optimization

Current Monthly Costs (~$150/month)

Cost Reduction Strategies

Production Best Practices

High Availability Features

Security Considerations

Monitoring and Observability

Next Steps

Key Takeaways

Architecture Benefits

Production Readiness

Operational Excellence

Table of Contents

Share This Post