Step-by-step guide to building production-ready infrastructure for ECS microservices using Terraform, covering VPC design, security groups, RDS setup, ECR repositories, and ECS cluster configuration.
Welcome to Part 2 of our comprehensive series on building production-grade ECS microservices! In this installment, we’ll dive deep into creating the foundational infrastructure using Terraform, following Infrastructure as Code (IaC) best practices.
In this phase, we’ll establish the production-ready infrastructure foundation that includes:
Our infrastructure follows AWS Well-Architected Framework principles, implementing a robust, scalable, and cost-effective foundation.
┌─────────────────────────────────────────────────────────────────┐
│ Internet Gateway │
└────────────────────────┬────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Public Subnets (2 AZs) │
│ ┌─────────────────────┐ ┌─────────────────────┐ │
│ │ ap-south-1a │ │ ap-south-1b │ │
│ │ 10.0.1.0/24 │ │ 10.0.2.0/24 │ │
│ │ │ │ │ │
│ │ ┌───────────────┐ │ │ ┌───────────────┐ │ │
│ │ │ NAT Gateway │ │ │ │ NAT Gateway │ │ │
│ │ │ + Elastic IP │ │ │ │ + Elastic IP │ │ │
│ │ └───────────────┘ │ │ └───────────────┘ │ │
│ └─────────────────────┘ └─────────────────────┘ │
└────────────────────────┬────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Private Subnets (2 AZs) │
│ ┌─────────────────────┐ ┌─────────────────────┐ │
│ │ ap-south-1a │ │ ap-south-1b │ │
│ │ 10.0.11.0/24 │ │ 10.0.12.0/24 │ │
│ │ │ │ │ │
│ │ ┌───────────────┐ │ │ ┌───────────────┐ │ │
│ │ │ ECS Fargate │ │ │ │ ECS Fargate │ │ │
│ │ │ Tasks │ │ │ │ Tasks │ │ │
│ │ └───────────────┘ │ │ └───────────────┘ │ │
│ └─────────────────────┘ └─────────────────────┘ │
└────────────────────────┬────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Database Subnets (2 AZs) │
│ ┌─────────────────────┐ ┌─────────────────────┐ │
│ │ ap-south-1a │ │ ap-south-1b │ │
│ │ 10.0.21.0/24 │ │ 10.0.22.0/24 │ │
│ │ │ │ │ │
│ │ ┌───────────────┐ │ │ ┌───────────────┐ │ │
│ │ │ RDS Primary │ │ │ │ RDS Standby │ │ │
│ │ │ PostgreSQL │ │ │ │ PostgreSQL │ │ │
│ │ └───────────────┘ │ │ └───────────────┘ │ │
│ └─────────────────────┘ └─────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
Before we begin, ensure you have the following tools and access:
Estimated Monthly Costs (ap-south-1):
| Resource | Configuration | Est. Cost/Month |
|---|---|---|
| NAT Gateways (2) | Always running | ~$65 |
| RDS PostgreSQL | db.t3.micro Multi-AZ | ~$35 |
| Elastic IPs (2) | For NAT Gateways | ~$7 |
| ECS Cluster | No cost (serverless) | $0 |
| ECR Repositories | No cost until images stored | $0 |
| Total | ~$107/month |
💡 Cost Optimization Tip: For development, consider using 1 NAT Gateway instead of 2 to save ~$32/month.
VPC (Virtual Private Cloud)
Subnet Architecture
Security Groups
IAM Roles
RDS PostgreSQL
ECR Repositories
ECS Cluster
Now let’s implement our infrastructure using Terraform. We’ll follow best practices for organization, security, and maintainability.
Create a well-organized project structure that follows Terraform best practices:
# Create project directory
mkdir -p ecs-cicd-project/terraform
cd ecs-cicd-project/terraform
# Create additional directories for organization
mkdir -p modules/{networking,database,compute,security}
mkdir -p environments/{dev,staging,prod}
We’ll organize our Terraform configuration into logical modules for better maintainability:
terraform/
├── main.tf # Core infrastructure
├── variables.tf # Input variables
├── outputs.tf # Output values
├── provider.tf # Provider configuration
├── terraform.tfvars # Variable values (not in git)
├── terraform.tfvars.example # Example variables
└── .gitignore # Git ignore file
Let’s create each configuration file with production-grade settings and comprehensive documentation.
provider.tfThis file configures the AWS provider with production-grade settings and optional remote state management.
terraform {
required_version = ">= 1.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
# Optional: Configure S3 backend for team collaboration
# backend "s3" {
# bucket = "your-terraform-state-bucket"
# key = "ecs-cicd/terraform.tfstate"
# region = "ap-south-1"
# encrypt = true
# dynamodb_table = "terraform-state-lock"
# }
}
provider "aws" {
region = var.aws_region
default_tags {
tags = {
Project = var.project_name
Environment = var.environment
ManagedBy = "Terraform"
}
}
}
variables.tfDefine all input variables with comprehensive validation and documentation for the infrastructure.
variable "aws_region" {
description = "AWS region for resources"
type = string
default = "ap-south-1"
}
variable "project_name" {
description = "Project name for resource naming"
type = string
default = "ecs-microservices"
}
variable "environment" {
description = "Environment name (dev, staging, prod)"
type = string
default = "dev"
}
variable "vpc_cidr" {
description = "CIDR block for VPC"
type = string
default = "10.0.0.0/16"
}
variable "availability_zones" {
description = "Availability zones for multi-AZ deployment"
type = list(string)
default = ["ap-south-1a", "ap-south-1b"]
}
variable "db_username" {
description = "Master username for RDS"
type = string
default = "dbadmin"
sensitive = true
}
variable "db_password" {
description = "Master password for RDS"
type = string
sensitive = true
}
variable "db_name" {
description = "Database name"
type = string
default = "microservices_db"
}
variable "db_instance_class" {
description = "RDS instance class"
type = string
default = "db.t3.micro"
}
variable "db_allocated_storage" {
description = "Allocated storage for RDS in GB"
type = number
default = 20
}
main.tfThis is the core infrastructure file containing VPC, subnets, gateways, RDS, ECS cluster, and security configurations with production-grade settings.
# ============================================
# VPC and Networking
# ============================================
resource "aws_vpc" "main" {
cidr_block = var.vpc_cidr
enable_dns_hostnames = true
enable_dns_support = true
tags = {
Name = "${var.project_name}-vpc"
}
}
# Internet Gateway
resource "aws_internet_gateway" "main" {
vpc_id = aws_vpc.main.id
tags = {
Name = "${var.project_name}-igw"
}
}
# ============================================
# Public Subnets (for ALB)
# ============================================
resource "aws_subnet" "public" {
count = 2
vpc_id = aws_vpc.main.id
cidr_block = "10.0.${count.index + 1}.0/24"
availability_zone = var.availability_zones[count.index]
map_public_ip_on_launch = true
tags = {
Name = "${var.project_name}-public-subnet-${count.index + 1}"
Type = "Public"
}
}
# Public Route Table
resource "aws_route_table" "public" {
vpc_id = aws_vpc.main.id
route {
cidr_block = "0.0.0.0/0"
gateway_id = aws_internet_gateway.main.id
}
tags = {
Name = "${var.project_name}-public-rt"
}
}
# Public Route Table Associations
resource "aws_route_table_association" "public" {
count = 2
subnet_id = aws_subnet.public[count.index].id
route_table_id = aws_route_table.public.id
}
# ============================================
# Private Subnets (for ECS tasks)
# ============================================
resource "aws_subnet" "private" {
count = 2
vpc_id = aws_vpc.main.id
cidr_block = "10.0.${count.index + 11}.0/24"
availability_zone = var.availability_zones[count.index]
tags = {
Name = "${var.project_name}-private-subnet-${count.index + 1}"
Type = "Private"
}
}
# Elastic IPs for NAT Gateways
resource "aws_eip" "nat" {
count = 2
domain = "vpc"
tags = {
Name = "${var.project_name}-nat-eip-${count.index + 1}"
}
depends_on = [aws_internet_gateway.main]
}
# NAT Gateways (one per AZ for high availability)
resource "aws_nat_gateway" "main" {
count = 2
allocation_id = aws_eip.nat[count.index].id
subnet_id = aws_subnet.public[count.index].id
tags = {
Name = "${var.project_name}-nat-gw-${count.index + 1}"
}
}
# Private Route Tables (one per AZ)
resource "aws_route_table" "private" {
count = 2
vpc_id = aws_vpc.main.id
route {
cidr_block = "0.0.0.0/0"
nat_gateway_id = aws_nat_gateway.main[count.index].id
}
tags = {
Name = "${var.project_name}-private-rt-${count.index + 1}"
}
}
# Private Route Table Associations
resource "aws_route_table_association" "private" {
count = 2
subnet_id = aws_subnet.private[count.index].id
route_table_id = aws_route_table.private[count.index].id
}
# ============================================
# Database Subnets
# ============================================
resource "aws_subnet" "database" {
count = 2
vpc_id = aws_vpc.main.id
cidr_block = "10.0.${count.index + 21}.0/24"
availability_zone = var.availability_zones[count.index]
tags = {
Name = "${var.project_name}-db-subnet-${count.index + 1}"
Type = "Database"
}
}
# Database Subnet Group
resource "aws_db_subnet_group" "main" {
name = "${var.project_name}-db-subnet-group"
subnet_ids = aws_subnet.database[*].id
tags = {
Name = "${var.project_name}-db-subnet-group"
}
}
# Database Route Tables (no internet access)
resource "aws_route_table" "database" {
vpc_id = aws_vpc.main.id
tags = {
Name = "${var.project_name}-db-rt"
}
}
# Database Route Table Associations
resource "aws_route_table_association" "database" {
count = 2
subnet_id = aws_subnet.database[count.index].id
route_table_id = aws_route_table.database.id
}
# ============================================
# Security Groups
# ============================================
# ALB Security Group
resource "aws_security_group" "alb" {
name = "${var.project_name}-alb-sg"
description = "Security group for Application Load Balancer"
vpc_id = aws_vpc.main.id
ingress {
description = "HTTP from internet"
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
ingress {
description = "HTTPS from internet"
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
egress {
description = "Allow all outbound"
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = {
Name = "${var.project_name}-alb-sg"
}
}
# ECS Tasks Security Group
resource "aws_security_group" "ecs_tasks" {
name = "${var.project_name}-ecs-tasks-sg"
description = "Security group for ECS tasks"
vpc_id = aws_vpc.main.id
ingress {
description = "Allow traffic from ALB"
from_port = 0
to_port = 65535
protocol = "tcp"
security_groups = [aws_security_group.alb.id]
}
ingress {
description = "Allow traffic within ECS tasks (Service Connect)"
from_port = 0
to_port = 65535
protocol = "tcp"
self = true
}
egress {
description = "Allow all outbound"
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = {
Name = "${var.project_name}-ecs-tasks-sg"
}
}
# RDS Security Group
resource "aws_security_group" "rds" {
name = "${var.project_name}-rds-sg"
description = "Security group for RDS PostgreSQL"
vpc_id = aws_vpc.main.id
ingress {
description = "PostgreSQL from ECS tasks"
from_port = 5432
to_port = 5432
protocol = "tcp"
security_groups = [aws_security_group.ecs_tasks.id]
}
egress {
description = "Allow all outbound"
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = {
Name = "${var.project_name}-rds-sg"
}
}
# ============================================
# RDS PostgreSQL Database
# ============================================
resource "aws_db_instance" "main" {
identifier = "${var.project_name}-postgres"
engine = "postgres"
engine_version = "15.4"
instance_class = var.db_instance_class
allocated_storage = var.db_allocated_storage
storage_type = "gp3"
storage_encrypted = true
db_name = var.db_name
username = var.db_username
password = var.db_password
multi_az = true
db_subnet_group_name = aws_db_subnet_group.main.name
vpc_security_group_ids = [aws_security_group.rds.id]
backup_retention_period = 7
backup_window = "03:00-04:00"
maintenance_window = "mon:04:00-mon:05:00"
skip_final_snapshot = true
final_snapshot_identifier = "${var.project_name}-postgres-final-snapshot"
enabled_cloudwatch_logs_exports = ["postgresql", "upgrade"]
tags = {
Name = "${var.project_name}-postgres"
}
}
# ============================================
# ECR Repositories
# ============================================
resource "aws_ecr_repository" "flask_app" {
name = "${var.project_name}/flask-app"
image_tag_mutability = "MUTABLE"
image_scanning_configuration {
scan_on_push = true
}
encryption_configuration {
encryption_type = "AES256"
}
tags = {
Name = "${var.project_name}-flask-app"
}
}
resource "aws_ecr_repository" "nginx" {
name = "${var.project_name}/nginx"
image_tag_mutability = "MUTABLE"
image_scanning_configuration {
scan_on_push = true
}
encryption_configuration {
encryption_type = "AES256"
}
tags = {
Name = "${var.project_name}-nginx"
}
}
# ECR Lifecycle Policy (keep last 10 images)
resource "aws_ecr_lifecycle_policy" "flask_app" {
repository = aws_ecr_repository.flask_app.name
policy = jsonencode({
rules = [{
rulePriority = 1
description = "Keep last 10 images"
selection = {
tagStatus = "any"
countType = "imageCountMoreThan"
countNumber = 10
}
action = {
type = "expire"
}
}]
})
}
resource "aws_ecr_lifecycle_policy" "nginx" {
repository = aws_ecr_repository.nginx.name
policy = jsonencode({
rules = [{
rulePriority = 1
description = "Keep last 10 images"
selection = {
tagStatus = "any"
countType = "imageCountMoreThan"
countNumber = 10
}
action = {
type = "expire"
}
}]
})
}
# ============================================
# ECS Cluster
# ============================================
resource "aws_ecs_cluster" "main" {
name = "${var.project_name}-cluster"
setting {
name = "containerInsights"
value = "enabled"
}
configuration {
execute_command_configuration {
logging = "OVERRIDE"
log_configuration {
cloud_watch_log_group_name = aws_cloudwatch_log_group.ecs_exec.name
}
}
}
tags = {
Name = "${var.project_name}-cluster"
}
}
resource "aws_ecs_cluster_capacity_providers" "main" {
cluster_name = aws_ecs_cluster.main.name
capacity_providers = ["FARGATE", "FARGATE_SPOT"]
default_capacity_provider_strategy {
capacity_provider = "FARGATE"
weight = 1
base = 1
}
}
# ============================================
# CloudWatch Log Groups
# ============================================
resource "aws_cloudwatch_log_group" "ecs_exec" {
name = "/ecs/${var.project_name}/exec"
retention_in_days = 7
tags = {
Name = "${var.project_name}-ecs-exec-logs"
}
}
resource "aws_cloudwatch_log_group" "flask_app" {
name = "/ecs/${var.project_name}/flask-app"
retention_in_days = 7
tags = {
Name = "${var.project_name}-flask-app-logs"
}
}
resource "aws_cloudwatch_log_group" "nginx" {
name = "/ecs/${var.project_name}/nginx"
retention_in_days = 7
tags = {
Name = "${var.project_name}-nginx-logs"
}
}
# ============================================
# IAM Roles for ECS
# ============================================
# ECS Task Execution Role (used by ECS agent)
resource "aws_iam_role" "ecs_task_execution_role" {
name = "${var.project_name}-ecs-task-execution-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "ecs-tasks.amazonaws.com"
}
}]
})
tags = {
Name = "${var.project_name}-ecs-task-execution-role"
}
}
resource "aws_iam_role_policy_attachment" "ecs_task_execution_role_policy" {
role = aws_iam_role.ecs_task_execution_role.name
policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy"
}
# Additional policy for ECR and CloudWatch
resource "aws_iam_role_policy" "ecs_task_execution_role_custom" {
name = "${var.project_name}-ecs-task-execution-custom"
role = aws_iam_role.ecs_task_execution_role.id
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Action = [
"ecr:GetAuthorizationToken",
"ecr:BatchCheckLayerAvailability",
"ecr:GetDownloadUrlForLayer",
"ecr:BatchGetImage"
]
Resource = "*"
},
{
Effect = "Allow"
Action = [
"logs:CreateLogStream",
"logs:PutLogEvents"
]
Resource = "*"
}
]
})
}
# ECS Task Role (used by the application)
resource "aws_iam_role" "ecs_task_role" {
name = "${var.project_name}-ecs-task-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "ecs-tasks.amazonaws.com"
}
}]
})
tags = {
Name = "${var.project_name}-ecs-task-role"
}
}
# Allow ECS Exec for debugging
resource "aws_iam_role_policy" "ecs_task_role_exec" {
name = "${var.project_name}-ecs-task-exec"
role = aws_iam_role.ecs_task_role.id
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Action = [
"ssmmessages:CreateControlChannel",
"ssmmessages:CreateDataChannel",
"ssmmessages:OpenControlChannel",
"ssmmessages:OpenDataChannel"
]
Resource = "*"
}
]
})
}
outputs.tfDefine comprehensive outputs that will be used in later phases and for monitoring.
# VPC and Networking Outputs
output "vpc_id" {
description = "VPC ID"
value = aws_vpc.main.id
}
output "public_subnet_ids" {
description = "Public subnet IDs"
value = aws_subnet.public[*].id
}
output "private_subnet_ids" {
description = "Private subnet IDs"
value = aws_subnet.private[*].id
}
output "database_subnet_ids" {
description = "Database subnet IDs"
value = aws_subnet.database[*].id
}
# Security Group Outputs
output "alb_security_group_id" {
description = "ALB security group ID"
value = aws_security_group.alb.id
}
output "ecs_tasks_security_group_id" {
description = "ECS tasks security group ID"
value = aws_security_group.ecs_tasks.id
}
output "rds_security_group_id" {
description = "RDS security group ID"
value = aws_security_group.rds.id
}
# RDS Outputs
output "rds_endpoint" {
description = "RDS endpoint"
value = aws_db_instance.main.endpoint
}
output "rds_database_name" {
description = "RDS database name"
value = aws_db_instance.main.db_name
}
output "rds_username" {
description = "RDS master username"
value = var.db_username
sensitive = true
}
# ECR Outputs
output "flask_app_repository_url" {
description = "Flask app ECR repository URL"
value = aws_ecr_repository.flask_app.repository_url
}
output "nginx_repository_url" {
description = "Nginx ECR repository URL"
value = aws_ecr_repository.nginx.repository_url
}
# ECS Outputs
output "ecs_cluster_name" {
description = "ECS cluster name"
value = aws_ecs_cluster.main.name
}
output "ecs_cluster_id" {
description = "ECS cluster ID"
value = aws_ecs_cluster.main.id
}
output "ecs_task_execution_role_arn" {
description = "ECS task execution role ARN"
value = aws_iam_role.ecs_task_execution_role.arn
}
output "ecs_task_role_arn" {
description = "ECS task role ARN"
value = aws_iam_role.ecs_task_role.arn
}
# CloudWatch Log Groups
output "flask_app_log_group" {
description = "Flask app CloudWatch log group name"
value = aws_cloudwatch_log_group.flask_app.name
}
output "nginx_log_group" {
description = "Nginx CloudWatch log group name"
value = aws_cloudwatch_log_group.nginx.name
}
# Region Output
output "aws_region" {
description = "AWS region"
value = var.aws_region
}
terraform.tfvars.exampleExample variables file with production-ready defaults (users should copy this to terraform.tfvars and customize).
# AWS Configuration
aws_region = "ap-south-1"
# Project Configuration
project_name = "ecs-microservices"
environment = "dev"
# Network Configuration
vpc_cidr = "10.0.0.0/16"
availability_zones = ["ap-south-1a", "ap-south-1b"]
# Database Configuration
db_username = "dbadmin"
db_password = "YourSecurePassword123!" # Change this!
db_name = "microservices_db"
db_instance_class = "db.t3.micro"
db_allocated_storage = 20
# Copy the example file
cp terraform.tfvars.example terraform.tfvars
# Edit with your values (especially db_password!)
nano terraform.tfvars
Important: Set a strong db_password in your terraform.tfvars file.
terraform init
This will download the required AWS provider plugins and initialize the backend.
terraform validate
Ensure there are no syntax errors and the configuration is valid.
terraform plan
Review the resources that will be created. You should see:
terraform apply
Type yes when prompted. This will take approximately 15-20 minutes due to:
aws ecs describe-clusters \
--clusters ecs-microservices-cluster \
--region ap-south-1
aws ecr describe-repositories --region ap-south-1
You should see two repositories:
ecs-microservices/flask-appecs-microservices/nginxaws rds describe-db-instances \
--db-instance-identifier ecs-microservices-postgres \
--region ap-south-1
Verify the status is available.
terraform output
Save these outputs - you’ll need them in later phases:
Estimated Monthly Costs (ap-south-1):
| Resource | Configuration | Est. Cost/Month |
|---|---|---|
| NAT Gateways (2) | Always running | ~$65 |
| RDS PostgreSQL | db.t3.micro Multi-AZ | ~$35 |
| Elastic IPs (2) | For NAT Gateways | ~$7 |
| ECS Cluster | No cost (serverless) | $0 |
| ECR Repositories | No cost until images stored | $0 |
| Total | ~$107/month |
Cost Optimization Tips:
db.t3.micro for developmentSolution: Ensure your CIDR blocks don’t overlap. The default values should work.
Solution: RDS Multi-AZ can take 15-20 minutes. Be patient. If it fails, check:
aws rds describe-db-instances \
--db-instance-identifier ecs-microservices-postgres \
--region ap-south-1
Solution: Ensure your AWS credentials have sufficient permissions:
terraform.tfvars to version controlOnce infrastructure is complete:
Proceed to Part 3: Application Containerization where we’ll build the Flask and Nginx applications and create production-ready Docker images.
The terraform.tfvars file contains sensitive data. Add it to .gitignore:
echo "terraform.tfvars" >> .gitignore
echo "*.tfstate*" >> .gitignore
echo ".terraform/" >> .gitignore
For team collaboration, configure an S3 backend (commented out in provider.tf):
backend "s3" {
bucket = "your-terraform-state-bucket"
key = "ecs-cicd/terraform.tfstate"
region = "ap-south-1"
encrypt = true
dynamodb_table = "terraform-state-lock"
}
To create multiple environments (dev/staging/prod), use Terraform workspaces or separate state files.
Always keep a backup of your terraform.tfstate file. It’s critical for managing infrastructure.
This infrastructure provides:
✅ Production-ready foundation for ECS microservices
✅ High availability across multiple availability zones
✅ Security best practices with network isolation and least privilege
✅ Cost optimization with detailed cost breakdown
✅ Infrastructure as Code with Terraform
✅ Comprehensive monitoring and logging setup
✅ Scalable architecture ready for container workloads
This foundation will serve as the basis for deploying containerized applications in the next phase. The infrastructure is designed to be secure, scalable, and cost-effective for production workloads.
Ready for the next phase? In Part 3, we’ll containerize our applications and prepare them for deployment to this infrastructure! Here is the Part 3, where we’ll containerize our applications and prepare them for deployment to this infrastructure!
Questions or feedback? Feel free to reach out in the comments below!