Building Production-Grade ECS Microservices with CI/CD - Part 2: Infrastructure Setup with Terraform

Step-by-step guide to building production-ready infrastructure for ECS microservices using Terraform, covering VPC design, security groups, RDS setup, ECR repositories, and ECS cluster configuration.

Building Production-Grade ECS Microservices with CI/CD - Part 2: Infrastructure Setup with Terraform

Table of Contents

Building Production-Grade ECS Microservices with CI/CD - Part 2: Infrastructure Setup with Terraform

Welcome to Part 2 of our comprehensive series on building production-grade ECS microservices! In this installment, we’ll dive deep into creating the foundational infrastructure using Terraform, following Infrastructure as Code (IaC) best practices.

What We’re Building

In this phase, we’ll establish the production-ready infrastructure foundation that includes:

  • Multi-tier VPC architecture with public, private, and database subnets
  • High-availability networking across multiple availability zones
  • Secure database setup with RDS PostgreSQL Multi-AZ
  • Container registry with ECR for Docker images
  • ECS cluster ready for Fargate workloads
  • Security groups with least-privilege access
  • CloudWatch logging and monitoring setup
  • IAM roles for secure service execution

Infrastructure Architecture

Our infrastructure follows AWS Well-Architected Framework principles, implementing a robust, scalable, and cost-effective foundation.

Network Architecture

┌─────────────────────────────────────────────────────────────────┐
│                    Internet Gateway                             │
└────────────────────────┬────────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────────┐
│                    Public Subnets (2 AZs)                      │
│  ┌─────────────────────┐  ┌─────────────────────┐              │
│  │   ap-south-1a       │  │   ap-south-1b       │              │
│  │   10.0.1.0/24       │  │   10.0.2.0/24       │              │
│  │                     │  │                     │              │
│  │  ┌───────────────┐  │  │  ┌───────────────┐  │              │
│  │  │ NAT Gateway   │  │  │  │ NAT Gateway   │  │              │
│  │  │ + Elastic IP  │  │  │  │ + Elastic IP  │  │              │
│  │  └───────────────┘  │  │  └───────────────┘  │              │
│  └─────────────────────┘  └─────────────────────┘              │
└────────────────────────┬────────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────────┐
│                   Private Subnets (2 AZs)                       │
│  ┌─────────────────────┐  ┌─────────────────────┐              │
│  │   ap-south-1a       │  │   ap-south-1b       │              │
│  │   10.0.11.0/24      │  │   10.0.12.0/24      │              │
│  │                     │  │                     │              │
│  │  ┌───────────────┐  │  │  ┌───────────────┐  │              │
│  │  │ ECS Fargate   │  │  │  │ ECS Fargate   │  │              │
│  │  │ Tasks         │  │  │  │ Tasks         │  │              │
│  │  └───────────────┘  │  │  └───────────────┘  │              │
│  └─────────────────────┘  └─────────────────────┘              │
└────────────────────────┬────────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────────┐
│                  Database Subnets (2 AZs)                       │
│  ┌─────────────────────┐  ┌─────────────────────┐              │
│  │   ap-south-1a       │  │   ap-south-1b       │              │
│  │   10.0.21.0/24      │  │   10.0.22.0/24      │              │
│  │                     │  │                     │              │
│  │  ┌───────────────┐  │  │  ┌───────────────┐  │              │
│  │  │ RDS Primary  │  │  │  │ RDS Standby  │  │              │
│  │  │ PostgreSQL   │  │  │  │ PostgreSQL   │  │              │
│  │  └───────────────┘  │  │  └───────────────┘  │              │
│  └─────────────────────┘  └─────────────────────┘              │
└─────────────────────────────────────────────────────────────────┘

Prerequisites

Before we begin, ensure you have the following tools and access:

Required Tools

  1. AWS CLI (v2.x) - Configured with appropriate credentials
  2. Terraform (v1.6+) - Infrastructure as Code tool
  3. Git - Version control
  4. Text Editor - VS Code, Vim, or your preferred editor

AWS Account Requirements

  • Administrator access or sufficient permissions for:
    • VPC, Subnets, Internet Gateway, NAT Gateway
    • RDS (CreateDBInstance, CreateDBSubnetGroup)
    • ECR (CreateRepository, PutImage)
    • ECS (CreateCluster, RegisterTaskDefinition)
    • IAM (CreateRole, AttachRolePolicy)
    • CloudWatch (CreateLogGroup, PutLogEvents)

Cost Considerations

Estimated Monthly Costs (ap-south-1):

ResourceConfigurationEst. Cost/Month
NAT Gateways (2)Always running~$65
RDS PostgreSQLdb.t3.micro Multi-AZ~$35
Elastic IPs (2)For NAT Gateways~$7
ECS ClusterNo cost (serverless)$0
ECR RepositoriesNo cost until images stored$0
Total~$107/month

💡 Cost Optimization Tip: For development, consider using 1 NAT Gateway instead of 2 to save ~$32/month.

Infrastructure Components

Core Networking

VPC (Virtual Private Cloud)

  • CIDR: 10.0.0.0/16
  • DNS Support: Enabled
  • DNS Hostnames: Enabled
  • Multi-AZ: ap-south-1a, ap-south-1b

Subnet Architecture

  • Public Subnets (2): Host ALB and NAT Gateways
    • CIDR: 10.0.1.0/24, 10.0.2.0/24
    • AZs: ap-south-1a, ap-south-1b
  • Private Subnets (2): Host ECS Fargate tasks
    • CIDR: 10.0.11.0/24, 10.0.12.0/24
    • AZs: ap-south-1a, ap-south-1b
  • Database Subnets (2): Host RDS PostgreSQL
    • CIDR: 10.0.21.0/24, 10.0.22.0/24
    • AZs: ap-south-1a, ap-south-1b

Security Architecture

Security Groups

  • ALB Security Group: HTTP/HTTPS from internet
  • ECS Tasks Security Group: Traffic from ALB and inter-service communication
  • RDS Security Group: PostgreSQL access from ECS tasks only

IAM Roles

  • ECS Task Execution Role: ECR access, CloudWatch logging
  • ECS Task Role: Application-specific permissions, ECS Exec

Database Layer

RDS PostgreSQL

  • Engine: PostgreSQL 15.4
  • Instance Class: db.t3.micro
  • Multi-AZ: Enabled for high availability
  • Encryption: At rest and in transit
  • Backup: 7-day retention

Container Infrastructure

ECR Repositories

  • Flask App: ecs-microservices/flask-app
  • Nginx: ecs-microservices/nginx
  • Image Scanning: Enabled on push
  • Lifecycle Policy: Keep last 10 images

ECS Cluster

  • Type: Fargate (serverless)
  • Container Insights: Enabled
  • Capacity Providers: FARGATE, FARGATE_SPOT

Step-by-Step Implementation

Now let’s implement our infrastructure using Terraform. We’ll follow best practices for organization, security, and maintainability.

Step 1: Project Structure Setup

Create a well-organized project structure that follows Terraform best practices:

# Create project directory
mkdir -p ecs-cicd-project/terraform
cd ecs-cicd-project/terraform

# Create additional directories for organization
mkdir -p modules/{networking,database,compute,security}
mkdir -p environments/{dev,staging,prod}

Step 2: Terraform Configuration Architecture

We’ll organize our Terraform configuration into logical modules for better maintainability:

terraform/
├── main.tf                 # Core infrastructure
├── variables.tf            # Input variables
├── outputs.tf             # Output values
├── provider.tf            # Provider configuration
├── terraform.tfvars       # Variable values (not in git)
├── terraform.tfvars.example # Example variables
└── .gitignore             # Git ignore file

Step 3: Create Terraform Configuration Files

Let’s create each configuration file with production-grade settings and comprehensive documentation.

3.1: Create provider.tf

This file configures the AWS provider with production-grade settings and optional remote state management.

terraform {
  required_version = ">= 1.0"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }

  # Optional: Configure S3 backend for team collaboration
  # backend "s3" {
  #   bucket         = "your-terraform-state-bucket"
  #   key            = "ecs-cicd/terraform.tfstate"
  #   region         = "ap-south-1"
  #   encrypt        = true
  #   dynamodb_table = "terraform-state-lock"
  # }
}

provider "aws" {
  region = var.aws_region

  default_tags {
    tags = {
      Project     = var.project_name
      Environment = var.environment
      ManagedBy   = "Terraform"
    }
  }
}

3.2: Create variables.tf

Define all input variables with comprehensive validation and documentation for the infrastructure.

variable "aws_region" {
  description = "AWS region for resources"
  type        = string
  default     = "ap-south-1"
}

variable "project_name" {
  description = "Project name for resource naming"
  type        = string
  default     = "ecs-microservices"
}

variable "environment" {
  description = "Environment name (dev, staging, prod)"
  type        = string
  default     = "dev"
}

variable "vpc_cidr" {
  description = "CIDR block for VPC"
  type        = string
  default     = "10.0.0.0/16"
}

variable "availability_zones" {
  description = "Availability zones for multi-AZ deployment"
  type        = list(string)
  default     = ["ap-south-1a", "ap-south-1b"]
}

variable "db_username" {
  description = "Master username for RDS"
  type        = string
  default     = "dbadmin"
  sensitive   = true
}

variable "db_password" {
  description = "Master password for RDS"
  type        = string
  sensitive   = true
}

variable "db_name" {
  description = "Database name"
  type        = string
  default     = "microservices_db"
}

variable "db_instance_class" {
  description = "RDS instance class"
  type        = string
  default     = "db.t3.micro"
}

variable "db_allocated_storage" {
  description = "Allocated storage for RDS in GB"
  type        = number
  default     = 20
}

3.3: Create main.tf

This is the core infrastructure file containing VPC, subnets, gateways, RDS, ECS cluster, and security configurations with production-grade settings.

# ============================================
# VPC and Networking
# ============================================

resource "aws_vpc" "main" {
  cidr_block           = var.vpc_cidr
  enable_dns_hostnames = true
  enable_dns_support   = true

  tags = {
    Name = "${var.project_name}-vpc"
  }
}

# Internet Gateway
resource "aws_internet_gateway" "main" {
  vpc_id = aws_vpc.main.id

  tags = {
    Name = "${var.project_name}-igw"
  }
}

# ============================================
# Public Subnets (for ALB)
# ============================================

resource "aws_subnet" "public" {
  count                   = 2
  vpc_id                  = aws_vpc.main.id
  cidr_block              = "10.0.${count.index + 1}.0/24"
  availability_zone       = var.availability_zones[count.index]
  map_public_ip_on_launch = true

  tags = {
    Name = "${var.project_name}-public-subnet-${count.index + 1}"
    Type = "Public"
  }
}

# Public Route Table
resource "aws_route_table" "public" {
  vpc_id = aws_vpc.main.id

  route {
    cidr_block = "0.0.0.0/0"
    gateway_id = aws_internet_gateway.main.id
  }

  tags = {
    Name = "${var.project_name}-public-rt"
  }
}

# Public Route Table Associations
resource "aws_route_table_association" "public" {
  count          = 2
  subnet_id      = aws_subnet.public[count.index].id
  route_table_id = aws_route_table.public.id
}

# ============================================
# Private Subnets (for ECS tasks)
# ============================================

resource "aws_subnet" "private" {
  count             = 2
  vpc_id            = aws_vpc.main.id
  cidr_block        = "10.0.${count.index + 11}.0/24"
  availability_zone = var.availability_zones[count.index]

  tags = {
    Name = "${var.project_name}-private-subnet-${count.index + 1}"
    Type = "Private"
  }
}

# Elastic IPs for NAT Gateways
resource "aws_eip" "nat" {
  count  = 2
  domain = "vpc"

  tags = {
    Name = "${var.project_name}-nat-eip-${count.index + 1}"
  }

  depends_on = [aws_internet_gateway.main]
}

# NAT Gateways (one per AZ for high availability)
resource "aws_nat_gateway" "main" {
  count         = 2
  allocation_id = aws_eip.nat[count.index].id
  subnet_id     = aws_subnet.public[count.index].id

  tags = {
    Name = "${var.project_name}-nat-gw-${count.index + 1}"
  }
}

# Private Route Tables (one per AZ)
resource "aws_route_table" "private" {
  count  = 2
  vpc_id = aws_vpc.main.id

  route {
    cidr_block     = "0.0.0.0/0"
    nat_gateway_id = aws_nat_gateway.main[count.index].id
  }

  tags = {
    Name = "${var.project_name}-private-rt-${count.index + 1}"
  }
}

# Private Route Table Associations
resource "aws_route_table_association" "private" {
  count          = 2
  subnet_id      = aws_subnet.private[count.index].id
  route_table_id = aws_route_table.private[count.index].id
}

# ============================================
# Database Subnets
# ============================================

resource "aws_subnet" "database" {
  count             = 2
  vpc_id            = aws_vpc.main.id
  cidr_block        = "10.0.${count.index + 21}.0/24"
  availability_zone = var.availability_zones[count.index]

  tags = {
    Name = "${var.project_name}-db-subnet-${count.index + 1}"
    Type = "Database"
  }
}

# Database Subnet Group
resource "aws_db_subnet_group" "main" {
  name       = "${var.project_name}-db-subnet-group"
  subnet_ids = aws_subnet.database[*].id

  tags = {
    Name = "${var.project_name}-db-subnet-group"
  }
}

# Database Route Tables (no internet access)
resource "aws_route_table" "database" {
  vpc_id = aws_vpc.main.id

  tags = {
    Name = "${var.project_name}-db-rt"
  }
}

# Database Route Table Associations
resource "aws_route_table_association" "database" {
  count          = 2
  subnet_id      = aws_subnet.database[count.index].id
  route_table_id = aws_route_table.database.id
}

# ============================================
# Security Groups
# ============================================

# ALB Security Group
resource "aws_security_group" "alb" {
  name        = "${var.project_name}-alb-sg"
  description = "Security group for Application Load Balancer"
  vpc_id      = aws_vpc.main.id

  ingress {
    description = "HTTP from internet"
    from_port   = 80
    to_port     = 80
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  ingress {
    description = "HTTPS from internet"
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  egress {
    description = "Allow all outbound"
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = {
    Name = "${var.project_name}-alb-sg"
  }
}

# ECS Tasks Security Group
resource "aws_security_group" "ecs_tasks" {
  name        = "${var.project_name}-ecs-tasks-sg"
  description = "Security group for ECS tasks"
  vpc_id      = aws_vpc.main.id

  ingress {
    description     = "Allow traffic from ALB"
    from_port       = 0
    to_port         = 65535
    protocol        = "tcp"
    security_groups = [aws_security_group.alb.id]
  }

  ingress {
    description = "Allow traffic within ECS tasks (Service Connect)"
    from_port   = 0
    to_port     = 65535
    protocol    = "tcp"
    self        = true
  }

  egress {
    description = "Allow all outbound"
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = {
    Name = "${var.project_name}-ecs-tasks-sg"
  }
}

# RDS Security Group
resource "aws_security_group" "rds" {
  name        = "${var.project_name}-rds-sg"
  description = "Security group for RDS PostgreSQL"
  vpc_id      = aws_vpc.main.id

  ingress {
    description     = "PostgreSQL from ECS tasks"
    from_port       = 5432
    to_port         = 5432
    protocol        = "tcp"
    security_groups = [aws_security_group.ecs_tasks.id]
  }

  egress {
    description = "Allow all outbound"
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = {
    Name = "${var.project_name}-rds-sg"
  }
}

# ============================================
# RDS PostgreSQL Database
# ============================================

resource "aws_db_instance" "main" {
  identifier             = "${var.project_name}-postgres"
  engine                 = "postgres"
  engine_version         = "15.4"
  instance_class         = var.db_instance_class
  allocated_storage      = var.db_allocated_storage
  storage_type           = "gp3"
  storage_encrypted      = true

  db_name  = var.db_name
  username = var.db_username
  password = var.db_password

  multi_az               = true
  db_subnet_group_name   = aws_db_subnet_group.main.name
  vpc_security_group_ids = [aws_security_group.rds.id]

  backup_retention_period = 7
  backup_window          = "03:00-04:00"
  maintenance_window     = "mon:04:00-mon:05:00"

  skip_final_snapshot       = true
  final_snapshot_identifier = "${var.project_name}-postgres-final-snapshot"

  enabled_cloudwatch_logs_exports = ["postgresql", "upgrade"]

  tags = {
    Name = "${var.project_name}-postgres"
  }
}

# ============================================
# ECR Repositories
# ============================================

resource "aws_ecr_repository" "flask_app" {
  name                 = "${var.project_name}/flask-app"
  image_tag_mutability = "MUTABLE"

  image_scanning_configuration {
    scan_on_push = true
  }

  encryption_configuration {
    encryption_type = "AES256"
  }

  tags = {
    Name = "${var.project_name}-flask-app"
  }
}

resource "aws_ecr_repository" "nginx" {
  name                 = "${var.project_name}/nginx"
  image_tag_mutability = "MUTABLE"

  image_scanning_configuration {
    scan_on_push = true
  }

  encryption_configuration {
    encryption_type = "AES256"
  }

  tags = {
    Name = "${var.project_name}-nginx"
  }
}

# ECR Lifecycle Policy (keep last 10 images)
resource "aws_ecr_lifecycle_policy" "flask_app" {
  repository = aws_ecr_repository.flask_app.name

  policy = jsonencode({
    rules = [{
      rulePriority = 1
      description  = "Keep last 10 images"
      selection = {
        tagStatus     = "any"
        countType     = "imageCountMoreThan"
        countNumber   = 10
      }
      action = {
        type = "expire"
      }
    }]
  })
}

resource "aws_ecr_lifecycle_policy" "nginx" {
  repository = aws_ecr_repository.nginx.name

  policy = jsonencode({
    rules = [{
      rulePriority = 1
      description  = "Keep last 10 images"
      selection = {
        tagStatus     = "any"
        countType     = "imageCountMoreThan"
        countNumber   = 10
      }
      action = {
        type = "expire"
      }
    }]
  })
}

# ============================================
# ECS Cluster
# ============================================

resource "aws_ecs_cluster" "main" {
  name = "${var.project_name}-cluster"

  setting {
    name  = "containerInsights"
    value = "enabled"
  }

  configuration {
    execute_command_configuration {
      logging = "OVERRIDE"
      log_configuration {
        cloud_watch_log_group_name = aws_cloudwatch_log_group.ecs_exec.name
      }
    }
  }

  tags = {
    Name = "${var.project_name}-cluster"
  }
}

resource "aws_ecs_cluster_capacity_providers" "main" {
  cluster_name = aws_ecs_cluster.main.name

  capacity_providers = ["FARGATE", "FARGATE_SPOT"]

  default_capacity_provider_strategy {
    capacity_provider = "FARGATE"
    weight           = 1
    base             = 1
  }
}

# ============================================
# CloudWatch Log Groups
# ============================================

resource "aws_cloudwatch_log_group" "ecs_exec" {
  name              = "/ecs/${var.project_name}/exec"
  retention_in_days = 7

  tags = {
    Name = "${var.project_name}-ecs-exec-logs"
  }
}

resource "aws_cloudwatch_log_group" "flask_app" {
  name              = "/ecs/${var.project_name}/flask-app"
  retention_in_days = 7

  tags = {
    Name = "${var.project_name}-flask-app-logs"
  }
}

resource "aws_cloudwatch_log_group" "nginx" {
  name              = "/ecs/${var.project_name}/nginx"
  retention_in_days = 7

  tags = {
    Name = "${var.project_name}-nginx-logs"
  }
}

# ============================================
# IAM Roles for ECS
# ============================================

# ECS Task Execution Role (used by ECS agent)
resource "aws_iam_role" "ecs_task_execution_role" {
  name = "${var.project_name}-ecs-task-execution-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Action = "sts:AssumeRole"
      Effect = "Allow"
      Principal = {
        Service = "ecs-tasks.amazonaws.com"
      }
    }]
  })

  tags = {
    Name = "${var.project_name}-ecs-task-execution-role"
  }
}

resource "aws_iam_role_policy_attachment" "ecs_task_execution_role_policy" {
  role       = aws_iam_role.ecs_task_execution_role.name
  policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy"
}

# Additional policy for ECR and CloudWatch
resource "aws_iam_role_policy" "ecs_task_execution_role_custom" {
  name = "${var.project_name}-ecs-task-execution-custom"
  role = aws_iam_role.ecs_task_execution_role.id

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Action = [
          "ecr:GetAuthorizationToken",
          "ecr:BatchCheckLayerAvailability",
          "ecr:GetDownloadUrlForLayer",
          "ecr:BatchGetImage"
        ]
        Resource = "*"
      },
      {
        Effect = "Allow"
        Action = [
          "logs:CreateLogStream",
          "logs:PutLogEvents"
        ]
        Resource = "*"
      }
    ]
  })
}

# ECS Task Role (used by the application)
resource "aws_iam_role" "ecs_task_role" {
  name = "${var.project_name}-ecs-task-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Action = "sts:AssumeRole"
      Effect = "Allow"
      Principal = {
        Service = "ecs-tasks.amazonaws.com"
      }
    }]
  })

  tags = {
    Name = "${var.project_name}-ecs-task-role"
  }
}

# Allow ECS Exec for debugging
resource "aws_iam_role_policy" "ecs_task_role_exec" {
  name = "${var.project_name}-ecs-task-exec"
  role = aws_iam_role.ecs_task_role.id

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Action = [
          "ssmmessages:CreateControlChannel",
          "ssmmessages:CreateDataChannel",
          "ssmmessages:OpenControlChannel",
          "ssmmessages:OpenDataChannel"
        ]
        Resource = "*"
      }
    ]
  })
}

3.4: Create outputs.tf

Define comprehensive outputs that will be used in later phases and for monitoring.

# VPC and Networking Outputs
output "vpc_id" {
  description = "VPC ID"
  value       = aws_vpc.main.id
}

output "public_subnet_ids" {
  description = "Public subnet IDs"
  value       = aws_subnet.public[*].id
}

output "private_subnet_ids" {
  description = "Private subnet IDs"
  value       = aws_subnet.private[*].id
}

output "database_subnet_ids" {
  description = "Database subnet IDs"
  value       = aws_subnet.database[*].id
}

# Security Group Outputs
output "alb_security_group_id" {
  description = "ALB security group ID"
  value       = aws_security_group.alb.id
}

output "ecs_tasks_security_group_id" {
  description = "ECS tasks security group ID"
  value       = aws_security_group.ecs_tasks.id
}

output "rds_security_group_id" {
  description = "RDS security group ID"
  value       = aws_security_group.rds.id
}

# RDS Outputs
output "rds_endpoint" {
  description = "RDS endpoint"
  value       = aws_db_instance.main.endpoint
}

output "rds_database_name" {
  description = "RDS database name"
  value       = aws_db_instance.main.db_name
}

output "rds_username" {
  description = "RDS master username"
  value       = var.db_username
  sensitive   = true
}

# ECR Outputs
output "flask_app_repository_url" {
  description = "Flask app ECR repository URL"
  value       = aws_ecr_repository.flask_app.repository_url
}

output "nginx_repository_url" {
  description = "Nginx ECR repository URL"
  value       = aws_ecr_repository.nginx.repository_url
}

# ECS Outputs
output "ecs_cluster_name" {
  description = "ECS cluster name"
  value       = aws_ecs_cluster.main.name
}

output "ecs_cluster_id" {
  description = "ECS cluster ID"
  value       = aws_ecs_cluster.main.id
}

output "ecs_task_execution_role_arn" {
  description = "ECS task execution role ARN"
  value       = aws_iam_role.ecs_task_execution_role.arn
}

output "ecs_task_role_arn" {
  description = "ECS task role ARN"
  value       = aws_iam_role.ecs_task_role.arn
}

# CloudWatch Log Groups
output "flask_app_log_group" {
  description = "Flask app CloudWatch log group name"
  value       = aws_cloudwatch_log_group.flask_app.name
}

output "nginx_log_group" {
  description = "Nginx CloudWatch log group name"
  value       = aws_cloudwatch_log_group.nginx.name
}

# Region Output
output "aws_region" {
  description = "AWS region"
  value       = var.aws_region
}

3.5: Create terraform.tfvars.example

Example variables file with production-ready defaults (users should copy this to terraform.tfvars and customize).

# AWS Configuration
aws_region = "ap-south-1"

# Project Configuration
project_name = "ecs-microservices"
environment  = "dev"

# Network Configuration
vpc_cidr           = "10.0.0.0/16"
availability_zones = ["ap-south-1a", "ap-south-1b"]

# Database Configuration
db_username         = "dbadmin"
db_password         = "YourSecurePassword123!"  # Change this!
db_name             = "microservices_db"
db_instance_class   = "db.t3.micro"
db_allocated_storage = 20

Step 4: Deploy Infrastructure

4.1: Create Your Variables File

# Copy the example file
cp terraform.tfvars.example terraform.tfvars

# Edit with your values (especially db_password!)
nano terraform.tfvars

Important: Set a strong db_password in your terraform.tfvars file.

4.2: Initialize Terraform

terraform init

This will download the required AWS provider plugins and initialize the backend.

4.3: Validate Configuration

terraform validate

Ensure there are no syntax errors and the configuration is valid.

4.4: Plan the Infrastructure

terraform plan

Review the resources that will be created. You should see:

  • Networking: 1 VPC, 6 Subnets (2 public, 2 private, 2 database)
  • Gateways: 2 NAT Gateways (with Elastic IPs), 1 Internet Gateway
  • Routing: Route tables and associations
  • Security: 3 Security Groups with least-privilege access
  • Database: 1 RDS PostgreSQL instance (Multi-AZ)
  • Container Registry: 2 ECR repositories with lifecycle policies
  • Compute: 1 ECS Cluster with Container Insights
  • IAM: Roles and policies for secure execution
  • Monitoring: CloudWatch log groups

4.5: Apply the Configuration

terraform apply

Type yes when prompted. This will take approximately 15-20 minutes due to:

  • RDS Multi-AZ deployment: ~10-15 minutes
  • NAT Gateway creation: ~5 minutes
  • VPC and subnet creation: ~2-3 minutes

Step 5: Verify Infrastructure

5.1: Check ECS Cluster

aws ecs describe-clusters \
  --clusters ecs-microservices-cluster \
  --region ap-south-1

5.2: Check ECR Repositories

aws ecr describe-repositories --region ap-south-1

You should see two repositories:

  • ecs-microservices/flask-app
  • ecs-microservices/nginx

5.3: Check RDS Instance

aws rds describe-db-instances \
  --db-instance-identifier ecs-microservices-postgres \
  --region ap-south-1

Verify the status is available.

5.4: Save Important Outputs

terraform output

Save these outputs - you’ll need them in later phases:

  • ECR repository URLs
  • RDS endpoint
  • VPC and subnet IDs
  • Security group IDs

Step 6: Understand the Costs

Estimated Monthly Costs (ap-south-1):

ResourceConfigurationEst. Cost/Month
NAT Gateways (2)Always running~$65
RDS PostgreSQLdb.t3.micro Multi-AZ~$35
Elastic IPs (2)For NAT Gateways~$7
ECS ClusterNo cost (serverless)$0
ECR RepositoriesNo cost until images stored$0
Total~$107/month

Cost Optimization Tips:

  1. NAT Gateways are the biggest cost - consider using 1 instead of 2 for dev
  2. Use db.t3.micro for development
  3. Clean up resources when not in use (see CLEANUP.md)
  4. Use Fargate Spot for non-production workloads

Troubleshooting

Issue: Terraform apply fails with “subnet conflicts”

Solution: Ensure your CIDR blocks don’t overlap. The default values should work.

Issue: RDS creation times out

Solution: RDS Multi-AZ can take 15-20 minutes. Be patient. If it fails, check:

aws rds describe-db-instances \
  --db-instance-identifier ecs-microservices-postgres \
  --region ap-south-1

Issue: Access denied errors

Solution: Ensure your AWS credentials have sufficient permissions:

  • EC2 (VPC, Subnets, Security Groups)
  • RDS (CreateDBInstance, CreateDBSubnetGroup)
  • ECR (CreateRepository)
  • ECS (CreateCluster)
  • IAM (CreateRole, AttachRolePolicy)

Production Best Practices

Security Considerations

  1. Secrets Management: Never commit terraform.tfvars to version control
  2. State Security: Use S3 backend with encryption for team collaboration
  3. Least Privilege: IAM roles follow principle of least privilege
  4. Network Isolation: Database subnets have no internet access
  5. Encryption: All data encrypted at rest and in transit

Monitoring and Observability

  1. CloudWatch Logs: Centralized logging for all services
  2. Container Insights: Enabled for ECS cluster monitoring
  3. RDS Monitoring: Enhanced monitoring enabled
  4. Cost Tracking: Use AWS Cost Explorer to monitor spending

Backup and Recovery

  1. RDS Backups: 7-day automated backups
  2. Terraform State: Critical for infrastructure management
  3. ECR Images: Lifecycle policies prevent storage bloat
  4. Documentation: Keep infrastructure documentation updated

Next Steps

Once infrastructure is complete:

  1. Core infrastructure is deployed and verified
  2. Database is ready with Multi-AZ failover
  3. ECR repositories are ready for container images
  4. ECS cluster is ready for Fargate workloads
  5. Security groups are configured with least privilege
  6. Monitoring is set up with CloudWatch

Proceed to Part 3: Application Containerization where we’ll build the Flask and Nginx applications and create production-ready Docker images.

Important Notes

Security

The terraform.tfvars file contains sensitive data. Add it to .gitignore:

echo "terraform.tfvars" >> .gitignore
echo "*.tfstate*" >> .gitignore
echo ".terraform/" >> .gitignore

State Management

For team collaboration, configure an S3 backend (commented out in provider.tf):

backend "s3" {
  bucket         = "your-terraform-state-bucket"
  key            = "ecs-cicd/terraform.tfstate"
  region         = "ap-south-1"
  encrypt        = true
  dynamodb_table = "terraform-state-lock"
}

Multi-Environment

To create multiple environments (dev/staging/prod), use Terraform workspaces or separate state files.

Backup

Always keep a backup of your terraform.tfstate file. It’s critical for managing infrastructure.

Key Takeaways

This infrastructure provides:

Production-ready foundation for ECS microservices
High availability across multiple availability zones
Security best practices with network isolation and least privilege
Cost optimization with detailed cost breakdown
Infrastructure as Code with Terraform
Comprehensive monitoring and logging setup
Scalable architecture ready for container workloads

This foundation will serve as the basis for deploying containerized applications in the next phase. The infrastructure is designed to be secure, scalable, and cost-effective for production workloads.


Ready for the next phase? In Part 3, we’ll containerize our applications and prepare them for deployment to this infrastructure! Here is the Part 3, where we’ll containerize our applications and prepare them for deployment to this infrastructure!

Questions or feedback? Feel free to reach out in the comments below!

Table of Contents