Building Production-Grade ECS Microservices with CI/CD - Part 4: ECS Deployment

Complete guide to deploying containerized microservices to AWS ECS Fargate with production-ready configurations, covering task definitions, service discovery, auto-scaling, load balancing, and monitoring.

Building Production-Grade ECS Microservices with CI/CD - Part 4: ECS Deployment

Table of Contents

Building Production-Grade ECS Microservices with CI/CD - Part 4: ECS Deployment

Welcome to Part 4 of our comprehensive series on building production-grade microservices on AWS ECS. In this installment, we’ll deploy our containerized applications to AWS ECS Fargate, configure service discovery, implement auto-scaling, and set up production-ready load balancing.

What We’ll Deploy

In this phase, we’ll create a complete production deployment including:

  1. ECS Task Definitions - Production-ready configurations for Flask, Nginx, and Redis services
  2. Application Load Balancer - High-performance load balancing with health checks and SSL termination
  3. ECS Services - Fargate-based services with Service Connect for seamless communication
  4. Auto-scaling Policies - CPU-based scaling to handle traffic spikes automatically
  5. Service Discovery - AWS Cloud Map integration for service-to-service communication
  6. Optional SSL/HTTPS - ACM certificates and Route53 DNS configuration

Production Deployment Architecture

Our ECS deployment follows a modern, scalable architecture:

┌─────────────────────────────────────────────────────────────────┐
│                    Internet Traffic                             │
└────────────────────────┬────────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────────┐
│              Route53 DNS (Optional)                           │
│              • Custom Domain                                   │
│              • ACM SSL Certificate                             │
└────────────────────────┬────────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────────┐
│            Application Load Balancer (ALB)                     │
│            • HTTP/HTTPS Listeners                              │
│            • Health Checks                                     │
│            • SSL Termination                                   │
│            • Cross-Zone Load Balancing                         │
└────────────────────────┬────────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────────┐
│                    ECS Fargate Services                        │
│  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐ │
│  │   Nginx Service │  │  Flask Service  │  │  Redis Service  │ │
│  │   • 2 Tasks     │  │  • 2 Tasks      │  │  • 1 Task       │ │
│  │   • Auto-scale  │  │  • Auto-scale   │  │  • Service      │ │
│  │   • Health      │  │  • Health       │  │    Connect      │ │
│  │     Checks      │  │    Checks       │  │                │ │
│  └─────────────────┘  └─────────────────┘  └─────────────────┘ │
└────────────────────────┬────────────────────────────────────────┘
                         │
            ┌────────────┴────────────┐
            ▼                        ▼
┌─────────────────────┐    ┌─────────────────────┐
│   RDS PostgreSQL    │    │   Service Connect    │
│   • Multi-AZ        │    │   • Cloud Map DNS    │
│   • Encrypted       │    │   • Health Checks    │
│   • Backups         │    │   • Load Balancing   │
└─────────────────────┘    └─────────────────────┘

Prerequisites

Before we begin, ensure you have completed the previous phases:

Infrastructure Requirements

  • Part 1 Complete: VPC, subnets, security groups, RDS database, ECR repositories
  • Part 2 Complete: ECS cluster, IAM roles, CloudWatch log groups
  • Part 3 Complete: Containerized applications pushed to ECR

AWS Permissions

Ensure your AWS credentials have the following permissions:

  • ECS: ecs:CreateService, ecs:UpdateService, ecs:RegisterTaskDefinition
  • ELB: elasticloadbalancing:CreateLoadBalancer, elasticloadbalancing:CreateTargetGroup
  • Auto Scaling: application-autoscaling:RegisterScalableTarget
  • Service Discovery: servicediscovery:CreateNamespace, servicediscovery:CreateService

Required Information

  • ECR repository URLs for Flask and Nginx images
  • RDS endpoint and credentials
  • VPC and subnet IDs
  • Security group IDs

Step-by-Step Implementation

Let’s deploy our production-ready ECS services step by step.

Step 1: Configure Service Discovery

First, we’ll set up AWS Cloud Map for service-to-service communication.

1.1: Create Service Discovery Namespace

Add to terraform/main.tf or create new file terraform/ecs.tf:

# ============================================
# Service Discovery Namespace
# ============================================

resource "aws_service_discovery_private_dns_namespace" "main" {
  name        = "${var.project_name}.local"
  description = "Private DNS namespace for service discovery"
  vpc         = aws_vpc.main.id

  tags = {
    Name = "${var.project_name}-service-discovery"
  }
}

# ============================================
# CloudWatch Log Groups
# ============================================

resource "aws_cloudwatch_log_group" "redis" {
  name              = "/ecs/${var.project_name}/redis"
  retention_in_days = 7

  tags = {
    Name = "${var.project_name}-redis-logs"
  }
}

resource "aws_cloudwatch_log_group" "flask_app" {
  name              = "/ecs/${var.project_name}/flask-app"
  retention_in_days = 7

  tags = {
    Name = "${var.project_name}-flask-app-logs"
  }
}

resource "aws_cloudwatch_log_group" "nginx" {
  name              = "/ecs/${var.project_name}/nginx"
  retention_in_days = 7

  tags = {
    Name = "${var.project_name}-nginx-logs"
  }
}

Service Discovery Benefits:

  • DNS-based service discovery for microservices communication
  • Health checks for automatic service registration
  • Load balancing across service instances
  • Service mesh capabilities with ECS Service Connect

Step 2: Deploy ECS Services

Now we’ll create production-ready ECS task definitions and services for our microservices.

2.1: Redis Service Configuration

We’ll deploy Redis as an ECS service for cost optimization (instead of ElastiCache):

# ============================================
# Service Discovery Namespace
# ============================================

resource "aws_service_discovery_private_dns_namespace" "main" {
  name        = "${var.project_name}.local"
  description = "Private DNS namespace for service discovery"
  vpc         = aws_vpc.main.id

  tags = {
    Name = "${var.project_name}-service-discovery"
  }
}

# ============================================
# Redis Task Definition and Service
# ============================================

resource "aws_ecs_task_definition" "redis" {
  family                   = "${var.project_name}-redis"
  network_mode             = "awsvpc"
  requires_compatibilities = ["FARGATE"]
  cpu                      = "256"
  memory                   = "512"
  execution_role_arn       = aws_iam_role.ecs_task_execution_role.arn
  task_role_arn            = aws_iam_role.ecs_task_role.arn

  container_definitions = jsonencode([
    {
      name      = "redis"
      image     = "redis:7-alpine"
      essential = true

      portMappings = [
        {
          containerPort = 6379
          protocol      = "tcp"
          name          = "redis"
        }
      ]

      healthCheck = {
        command     = ["CMD-SHELL", "redis-cli ping || exit 1"]
        interval    = 30
        timeout     = 5
        retries     = 3
        startPeriod = 60
      }

      logConfiguration = {
        logDriver = "awslogs"
        options = {
          "awslogs-group"         = aws_cloudwatch_log_group.redis.name
          "awslogs-region"        = var.aws_region
          "awslogs-stream-prefix" = "redis"
        }
      }
    }
  ])

  tags = {
    Name = "${var.project_name}-redis-task"
  }
}

resource "aws_ecs_service" "redis" {
  name            = "${var.project_name}-redis"
  cluster         = aws_ecs_cluster.main.id
  task_definition = aws_ecs_task_definition.redis.arn
  desired_count   = 1
  launch_type     = "FARGATE"

  network_configuration {
    subnets          = aws_subnet.private[*].id
    security_groups  = [aws_security_group.ecs_tasks.id]
    assign_public_ip = false
  }

  service_connect_configuration {
    enabled   = true
    namespace = aws_service_discovery_private_dns_namespace.main.arn

    service {
      port_name = "redis"
      client_alias {
        port     = 6379
        dns_name = "redis"
      }
    }
  }

  tags = {
    Name = "${var.project_name}-redis-service"
  }
}

#### 2.2: Flask Application Service

Our Flask API service with database integration and Redis caching:

resource "aws_ecs_task_definition" "flask_app" {
  family                   = "${var.project_name}-flask-app"
  network_mode             = "awsvpc"
  requires_compatibilities = ["FARGATE"]
  cpu                      = "512"
  memory                   = "1024"
  execution_role_arn       = aws_iam_role.ecs_task_execution_role.arn
  task_role_arn            = aws_iam_role.ecs_task_role.arn

  container_definitions = jsonencode([
    {
      name      = "flask-app"
      image     = "${aws_ecr_repository.flask_app.repository_url}:latest"
      essential = true

      environment = [
        {
          name  = "DB_HOST"
          value = split(":", aws_db_instance.main.endpoint)[0]
        },
        {
          name  = "DB_PORT"
          value = "5432"
        },
        {
          name  = "DB_NAME"
          value = aws_db_instance.main.db_name
        },
        {
          name  = "DB_USER"
          value = var.db_username
        },
        {
          name  = "DB_PASSWORD"
          value = var.db_password
        },
        {
          name  = "REDIS_HOST"
          value = "redis"
        },
        {
          name  = "REDIS_PORT"
          value = "6379"
        },
        {
          name  = "ENVIRONMENT"
          value = var.environment
        }
      ]

      portMappings = [
        {
          containerPort = 5000
          protocol      = "tcp"
          name          = "flask-app"
        }
      ]

      healthCheck = {
        command     = ["CMD-SHELL", "curl -f http://localhost:5000/health || exit 1"]
        interval    = 30
        timeout     = 5
        retries     = 3
        startPeriod = 60
      }

      logConfiguration = {
        logDriver = "awslogs"
        options = {
          "awslogs-group"         = aws_cloudwatch_log_group.flask_app.name
          "awslogs-region"        = var.aws_region
          "awslogs-stream-prefix" = "flask-app"
        }
      }
    }
  ])

  tags = {
    Name = "${var.project_name}-flask-app-task"
  }
}

resource "aws_ecs_service" "flask_app" {
  name            = "${var.project_name}-flask-app"
  cluster         = aws_ecs_cluster.main.id
  task_definition = aws_ecs_task_definition.flask_app.arn
  desired_count   = 2
  launch_type     = "FARGATE"

  network_configuration {
    subnets          = aws_subnet.private[*].id
    security_groups  = [aws_security_group.ecs_tasks.id]
    assign_public_ip = false
  }

  service_connect_configuration {
    enabled   = true
    namespace = aws_service_discovery_private_dns_namespace.main.arn

    service {
      port_name = "flask-app"
      client_alias {
        port     = 5000
        dns_name = "flask-app"
      }
    }
  }

  depends_on = [
    aws_ecs_service.redis,
    aws_lb_listener.http
  ]

  tags = {
    Name = "${var.project_name}-flask-app-service"
  }
}

#### 2.3: Nginx Reverse Proxy Service

Our Nginx service acts as the entry point and load balancer:

resource "aws_ecs_task_definition" "nginx" {
  family                   = "${var.project_name}-nginx"
  network_mode             = "awsvpc"
  requires_compatibilities = ["FARGATE"]
  cpu                      = "256"
  memory                   = "512"
  execution_role_arn       = aws_iam_role.ecs_task_execution_role.arn
  task_role_arn            = aws_iam_role.ecs_task_role.arn

  container_definitions = jsonencode([
    {
      name      = "nginx"
      image     = "${aws_ecr_repository.nginx.repository_url}:latest"
      essential = true

      portMappings = [
        {
          containerPort = 80
          protocol      = "tcp"
          name          = "nginx"
        }
      ]

      healthCheck = {
        command     = ["CMD-SHELL", "wget --no-verbose --tries=1 --spider http://localhost/nginx-health || exit 1"]
        interval    = 30
        timeout     = 5
        retries     = 3
        startPeriod = 30
      }

      logConfiguration = {
        logDriver = "awslogs"
        options = {
          "awslogs-group"         = aws_cloudwatch_log_group.nginx.name
          "awslogs-region"        = var.aws_region
          "awslogs-stream-prefix" = "nginx"
        }
      }
    }
  ])

  tags = {
    Name = "${var.project_name}-nginx-task"
  }
}

resource "aws_ecs_service" "nginx" {
  name            = "${var.project_name}-nginx"
  cluster         = aws_ecs_cluster.main.id
  task_definition = aws_ecs_task_definition.nginx.arn
  desired_count   = 2
  launch_type     = "FARGATE"

  network_configuration {
    subnets          = aws_subnet.private[*].id
    security_groups  = [aws_security_group.ecs_tasks.id]
    assign_public_ip = false
  }

  load_balancer {
    target_group_arn = aws_lb_target_group.nginx.arn
    container_name   = "nginx"
    container_port   = 80
  }

  service_connect_configuration {
    enabled   = true
    namespace = aws_service_discovery_private_dns_namespace.main.arn
  }

  depends_on = [
    aws_ecs_service.flask_app,
    aws_lb_listener.http
  ]

  tags = {
    Name = "${var.project_name}-nginx-service"
  }
}

#### 2.4: Auto-Scaling Configuration

We'll configure CPU-based auto-scaling for both Flask and Nginx services:

resource "aws_appautoscaling_target" "flask_app" {
  max_capacity       = 4
  min_capacity       = 2
  resource_id        = "service/${aws_ecs_cluster.main.name}/${aws_ecs_service.flask_app.name}"
  scalable_dimension = "ecs:service:DesiredCount"
  service_namespace  = "ecs"
}

resource "aws_appautoscaling_policy" "flask_app_cpu" {
  name               = "${var.project_name}-flask-app-cpu-scaling"
  policy_type        = "TargetTrackingScaling"
  resource_id        = aws_appautoscaling_target.flask_app.resource_id
  scalable_dimension = aws_appautoscaling_target.flask_app.scalable_dimension
  service_namespace  = aws_appautoscaling_target.flask_app.service_namespace

  target_tracking_scaling_policy_configuration {
    target_value       = 70.0
    scale_in_cooldown  = 300
    scale_out_cooldown = 60

    predefined_metric_specification {
      predefined_metric_type = "ECSServiceAverageCPUUtilization"
    }
  }
}

# ============================================
# Auto Scaling for Nginx
# ============================================

resource "aws_appautoscaling_target" "nginx" {
  max_capacity       = 4
  min_capacity       = 2
  resource_id        = "service/${aws_ecs_cluster.main.name}/${aws_ecs_service.nginx.name}"
  scalable_dimension = "ecs:service:DesiredCount"
  service_namespace  = "ecs"
}

resource "aws_appautoscaling_policy" "nginx_cpu" {
  name               = "${var.project_name}-nginx-cpu-scaling"
  policy_type        = "TargetTrackingScaling"
  resource_id        = aws_appautoscaling_target.nginx.resource_id
  scalable_dimension = aws_appautoscaling_target.nginx.scalable_dimension
  service_namespace  = aws_appautoscaling_target.nginx.service_namespace

  target_tracking_scaling_policy_configuration {
    target_value       = 70.0
    scale_in_cooldown  = 300
    scale_out_cooldown = 60

    predefined_metric_specification {
      predefined_metric_type = "ECSServiceAverageCPUUtilization"
    }
  }
}

Step 3: Configure Application Load Balancer

We’ll set up a production-ready ALB with health checks and SSL termination capabilities.

3.1: Create ALB Configuration

Create terraform/alb.tf:

# ============================================
# Application Load Balancer
# ============================================

resource "aws_lb" "main" {
  name               = "${var.project_name}-alb"
  internal           = false
  load_balancer_type = "application"
  security_groups    = [aws_security_group.alb.id]
  subnets            = aws_subnet.public[*].id

  enable_deletion_protection = false
  enable_http2              = true
  enable_cross_zone_load_balancing = true

  tags = {
    Name = "${var.project_name}-alb"
  }
}

# ============================================
# Target Groups
# ============================================

resource "aws_lb_target_group" "nginx" {
  name        = "${var.project_name}-nginx-tg"
  port        = 80
  protocol    = "HTTP"
  vpc_id      = aws_vpc.main.id
  target_type = "ip"

  health_check {
    enabled             = true
    healthy_threshold   = 2
    unhealthy_threshold = 3
    timeout             = 5
    interval            = 30
    path                = "/nginx-health"
    matcher             = "200"
  }

  deregistration_delay = 30

  tags = {
    Name = "${var.project_name}-nginx-tg"
  }
}

# ============================================
# ALB Listeners
# ============================================

resource "aws_lb_listener" "http" {
  load_balancer_arn = aws_lb.main.arn
  port              = "80"
  protocol          = "HTTP"

  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.nginx.arn
  }

  tags = {
    Name = "${var.project_name}-http-listener"
  }
}

# Optional: HTTPS Listener (uncomment when ACM certificate is ready)
# resource "aws_lb_listener" "https" {
#   load_balancer_arn = aws_lb.main.arn
#   port              = "443"
#   protocol          = "HTTPS"
#   ssl_policy        = "ELBSecurityPolicy-TLS-1-2-2017-01"
#   certificate_arn   = aws_acm_certificate.main.arn

#   default_action {
#     type             = "forward"
#     target_group_arn = aws_lb_target_group.nginx.arn
#   }

#   tags = {
#     Name = "${var.project_name}-https-listener"
#   }
# }

# Optional: Redirect HTTP to HTTPS (uncomment when HTTPS is configured)
# resource "aws_lb_listener_rule" "redirect_http_to_https" {
#   listener_arn = aws_lb_listener.http.arn
#   priority     = 1

#   action {
#     type = "redirect"
#     redirect {
#       port        = "443"
#       protocol    = "HTTPS"
#       status_code = "HTTP_301"
#     }
#   }

#   condition {
#     path_pattern {
#       values = ["/*"]
#     }
#   }
# }

# ============================================
# Outputs
# ============================================

output "alb_dns_name" {
  description = "ALB DNS name"
  value       = aws_lb.main.dns_name
}

output "alb_zone_id" {
  description = "ALB Zone ID for Route53"
  value       = aws_lb.main.zone_id
}

output "alb_arn" {
  description = "ALB ARN"
  value       = aws_lb.main.arn
}

Step 4: Optional - Route53 and ACM SSL

Create terraform/route53.tf (only if you have a domain):

# ============================================
# Variables for Route53 (add to variables.tf)
# ============================================

# Add these to variables.tf:
# variable "domain_name" {
#   description = "Domain name for the application"
#   type        = string
#   default     = ""  # e.g., "ecs.yourdomain.com"
# }
#
# variable "create_route53_record" {
#   description = "Create Route53 DNS record"
#   type        = bool
#   default     = false
# }

# ============================================
# ACM Certificate
# ============================================

resource "aws_acm_certificate" "main" {
  count             = var.create_route53_record ? 1 : 0
  domain_name       = var.domain_name
  validation_method = "DNS"

  lifecycle {
    create_before_destroy = true
  }

  tags = {
    Name = "${var.project_name}-certificate"
  }
}

# ============================================
# Route53 Record for ACM Validation
# ============================================

# Note: You need to have your hosted zone already created in Route53
# or managed in Cloudflare. This assumes Route53 hosted zone exists.

data "aws_route53_zone" "main" {
  count        = var.create_route53_record ? 1 : 0
  name         = var.domain_name
  private_zone = false
}

resource "aws_route53_record" "cert_validation" {
  count   = var.create_route53_record ? 1 : 0
  zone_id = data.aws_route53_zone.main[0].zone_id
  name    = tolist(aws_acm_certificate.main[0].domain_validation_options)[0].resource_record_name
  type    = tolist(aws_acm_certificate.main[0].domain_validation_options)[0].resource_record_type
  records = [tolist(aws_acm_certificate.main[0].domain_validation_options)[0].resource_record_value]
  ttl     = 60
}

resource "aws_acm_certificate_validation" "main" {
  count                   = var.create_route53_record ? 1 : 0
  certificate_arn         = aws_acm_certificate.main[0].arn
  validation_record_fqdns = [aws_route53_record.cert_validation[0].fqdn]
}

# ============================================
# Route53 A Record for ALB
# ============================================

resource "aws_route53_record" "alb" {
  count   = var.create_route53_record ? 1 : 0
  zone_id = data.aws_route53_zone.main[0].zone_id
  name    = var.domain_name
  type    = "A"

  alias {
    name                   = aws_lb.main.dns_name
    zone_id                = aws_lb.main.zone_id
    evaluate_target_health = true
  }
}

Step 5: Deploy the Complete Infrastructure

Now let’s deploy our production-ready ECS infrastructure.

5.1: Configure Optional Route53 Variables

Edit terraform/terraform.tfvars:

# ... existing variables ...

# Optional: Route53 DNS Configuration
create_route53_record = false  # Set to true if you have a domain
domain_name          = ""      # e.g., "ecs.yourdomain.com"

5.2: Deploy Infrastructure

cd terraform

# Validate configuration
terraform validate

# Review deployment plan
terraform plan

# Deploy infrastructure
terraform apply

What This Creates:

  • Service Discovery: AWS Cloud Map namespace for service communication
  • ECS Task Definitions: Production-ready configurations for all services
  • ECS Services: Fargate-based services with Service Connect
  • Application Load Balancer: High-performance load balancer with health checks
  • Target Groups: Health check configuration for Nginx service
  • Auto-scaling: CPU-based scaling policies for Flask and Nginx
  • Optional: ACM certificate and Route53 DNS records

Deployment Time: Approximately 10-15 minutes

  • ALB provisioning: ~5 minutes
  • ECS services startup: ~10 minutes
  • Health checks and service registration: ~5 minutes

5.3: Monitor Deployment

# Watch ECS services
aws ecs describe-services \
  --cluster ecs-microservices-cluster \
  --services ecs-microservices-nginx ecs-microservices-flask-app ecs-microservices-redis \
  --region ap-south-1

# Check running tasks
aws ecs list-tasks \
  --cluster ecs-microservices-cluster \
  --region ap-south-1

# View service events
aws ecs describe-services \
  --cluster ecs-microservices-cluster \
  --services ecs-microservices-nginx \
  --region ap-south-1 \
  --query 'services[0].events[0:5]'

Step 6: Validate Production Deployment

Let’s thoroughly test our production deployment to ensure everything is working correctly.

6.1: Get Application Load Balancer DNS Name

terraform output alb_dns_name

Or from AWS CLI:

aws elbv2 describe-load-balancers \
  --names ecs-microservices-alb \
  --region ap-south-1 \
  --query 'LoadBalancers[0].DNSName' \
  --output text

6.2: Comprehensive API Testing

Wait 2-3 minutes for services to be fully healthy, then run these tests:

# Set ALB DNS
ALB_DNS="<your-alb-dns-name>"

# Test home endpoint
curl http://$ALB_DNS/

# Test health
curl http://$ALB_DNS/health

# Test stats
curl http://$ALB_DNS/api/stats

# Test Redis cache
curl http://$ALB_DNS/api/cache-test

# Create a user
curl -X POST http://$ALB_DNS/api/users \
  -H "Content-Type: application/json" \
  -d '{"username": "testuser"}'

# Get users
curl http://$ALB_DNS/api/users

# Get app info
curl http://$ALB_DNS/api/info

6.3: Monitor Application Logs

# Flask logs
aws logs tail /ecs/ecs-microservices/flask-app --follow --region ap-south-1

# Nginx logs
aws logs tail /ecs/ecs-microservices/nginx --follow --region ap-south-1

# Redis logs
aws logs tail /ecs/ecs-microservices/redis --follow --region ap-south-1

Step 7: Verify Service Connect Communication

ECS Service Connect enables seamless service-to-service communication using DNS names.

# Connect to Flask container using ECS Exec
TASK_ID=$(aws ecs list-tasks \
  --cluster ecs-microservices-cluster \
  --service-name ecs-microservices-flask-app \
  --region ap-south-1 \
  --query 'taskArns[0]' \
  --output text | cut -d'/' -f3)

# Exec into container
aws ecs execute-command \
  --cluster ecs-microservices-cluster \
  --task $TASK_ID \
  --container flask-app \
  --interactive \
  --command "/bin/bash" \
  --region ap-south-1

# Inside container, test Redis connectivity
# redis-cli -h redis ping
# Should return: PONG

Step 8: Optional - Configure Custom Domain with Cloudflare

If your domain is in Cloudflare but you want to use ACM for SSL:

8.1: Create ACM Certificate Manually

  1. Go to AWS Certificate Manager in ap-south-1
  2. Request a public certificate for your subdomain (e.g., ecs.yourdomain.com)
  3. Choose DNS validation
  4. Copy the CNAME record details

8.2: Add Validation CNAME to Cloudflare

  1. Log in to Cloudflare
  2. Go to your domain’s DNS settings
  3. Add a CNAME record with ACM validation details
  4. Wait for certificate to be issued (~5 minutes)

8.3: Create A Record Pointing to ALB

  1. In Cloudflare, create an A record:
    • Name: ecs (or your subdomain)
    • Content: Your ALB DNS name
    • Proxy status: DNS only (gray cloud) - important for ALB
    • TTL: Auto

8.4: Update ALB to Use HTTPS

Uncomment the HTTPS listener in alb.tf and update the certificate ARN:

resource "aws_lb_listener" "https" {
  load_balancer_arn = aws_lb.main.arn
  port              = "443"
  protocol          = "HTTPS"
  ssl_policy        = "ELBSecurityPolicy-TLS-1-2-2017-01"
  certificate_arn   = "arn:aws:acm:ap-south-1:ACCOUNT_ID:certificate/CERT_ID"

  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.nginx.arn
  }
}

Then apply:

terraform apply

Production Troubleshooting

Issue: ECS Tasks Fail to Start

Check:

aws ecs describe-tasks \
  --cluster ecs-microservices-cluster \
  --tasks <task-arn> \
  --region ap-south-1

Common causes:

  • ECR image not found - push images again
  • IAM role missing permissions
  • Insufficient memory/CPU

Issue: ALB health checks failing

Check target health:

aws elbv2 describe-target-health \
  --target-group-arn <target-group-arn> \
  --region ap-south-1

Common causes:

  • Security group not allowing ALB → ECS traffic
  • Health check path incorrect
  • Container not listening on correct port

Issue: Service Connect not working

Verify:

  • All services have Service Connect enabled
  • Services are in the same namespace
  • DNS names are correct (redis, flask-app)

Issue: Can’t connect to database

Check:

  • RDS security group allows traffic from ECS security group
  • Database credentials are correct in task definition
  • RDS instance is in available state

Production Cost Optimization

Current Monthly Costs (~$150/month)

ComponentConfigurationMonthly Cost
NAT Gateways2x Multi-AZ~$65
RDS PostgreSQLMulti-AZ, db.t3.micro~$35
ECS Fargate2 Nginx + 2 Flask + 1 Redis~$40
Application Load BalancerInternet-facing~$20
Data TransferALB to ECS~$5
CloudWatch Logs7-day retention~$2

Cost Reduction Strategies

For Development/Testing:

  1. Single NAT Gateway: Use 1 NAT Gateway instead of 2 (saves ~$32/month)
  2. RDS Single-AZ: Use single-AZ for dev (saves ~$17/month)
  3. Reduced Task Count: 1 of each service (saves ~$20/month)
  4. Scheduled Shutdown: Stop services when not in use

For Production:

  1. Reserved Instances: Use RDS Reserved Instances for 1-year commitment
  2. Spot Instances: Use Fargate Spot for non-critical workloads
  3. Right-sizing: Monitor and adjust CPU/memory based on usage
  4. Lifecycle Policies: Implement ECR image lifecycle policies

Production Best Practices

High Availability Features

  1. Multi-AZ Deployment: Services distributed across 2 availability zones
  2. Auto-scaling: CPU-based scaling handles traffic spikes automatically
  3. Health Checks: ALB and container-level health monitoring
  4. Service Discovery: AWS Cloud Map for reliable service communication
  5. Load Balancing: ALB distributes traffic across healthy instances

Security Considerations

  1. Network Isolation: Private subnets for ECS tasks
  2. Security Groups: Least privilege access controls
  3. SSL/TLS: HTTPS termination at ALB
  4. IAM Roles: Task execution and task roles with minimal permissions
  5. Encryption: Data encrypted at rest and in transit

Monitoring and Observability

  1. CloudWatch Logs: Centralized logging for all services
  2. Health Checks: Multi-level health monitoring
  3. Auto-scaling: Automatic response to traffic changes
  4. Service Connect: Built-in service mesh capabilities
  5. Cost Monitoring: Detailed cost breakdown and optimization

Next Steps

Part 4 Complete! Your production deployment now includes:

  • ECS Fargate Services running across multiple AZs
  • Application Load Balancer with health checks and SSL termination
  • Auto-scaling policies for handling traffic spikes
  • Service Connect for seamless microservices communication
  • Production monitoring with CloudWatch logs and metrics
  • Cost optimization strategies for different environments

Proceed to Part 5: CI/CD Pipeline where we’ll automate deployments with GitHub Actions and implement blue-green deployments.

Key Takeaways

Architecture Benefits

  1. Serverless Architecture with ECS Fargate eliminates server management
  2. Service Mesh capabilities through ECS Service Connect
  3. Auto-scaling ensures optimal resource utilization
  4. High Availability across multiple availability zones
  5. Cost Optimization with detailed cost breakdown and strategies

Production Readiness

  1. Health Monitoring at multiple levels ensures reliability
  2. Security Best Practices with network isolation and IAM roles
  3. Scalability through auto-scaling and load balancing
  4. Observability with comprehensive logging and monitoring
  5. Cost Management with optimization strategies for different environments

Operational Excellence

  1. Service Discovery simplifies microservices communication
  2. Task Definitions are versioned for reliable deployments
  3. Load Balancing distributes traffic efficiently
  4. Auto-scaling handles traffic spikes automatically
  5. CloudWatch Integration provides comprehensive monitoring

This production deployment provides a robust, scalable, and secure foundation for running microservices on AWS ECS. The infrastructure is designed for high availability, cost optimization, and operational excellence.


Ready for automation? In Part 5, we’ll implement CI/CD pipelines to automate deployments and ensure continuous delivery! Here is the Part 5, where we’ll implement CI/CD pipelines to automate deployments and ensure continuous delivery!

Questions or feedback? Feel free to reach out in the comments below!

Table of Contents