Building Production-Grade ECS Microservices with CI/CD - Part 6: Production Cleanup

Complete guide to safely tearing down production AWS infrastructure to avoid ongoing charges, covering automated cleanup, cost optimization, and data preservation strategies.

Building Production-Grade ECS Microservices with CI/CD - Part 6: Production Cleanup

Table of Contents

Building Production-Grade ECS Microservices with CI/CD - Part 6: Production Cleanup

Welcome to the final part of our comprehensive series on building production-grade microservices on AWS ECS. In this installment, we’ll cover the essential process of safely tearing down your AWS infrastructure to avoid ongoing charges while preserving important data and configurations.

What We’ll Cover

In this cleanup guide, we’ll provide you with:

  1. Cost Analysis - Understanding what resources cost and why cleanup matters
  2. Safe Cleanup Procedures - Step-by-step resource deletion in the correct order
  3. Data Preservation - Creating snapshots and backups before deletion
  4. Automated Cleanup - Scripts and Terraform commands for efficient cleanup
  5. Cost Optimization - Strategies for reducing costs while keeping infrastructure
  6. Verification Procedures - Ensuring complete cleanup and cost elimination

Production Cost Analysis

Understanding the cost implications of your infrastructure is crucial for effective cleanup planning.

Monthly Cost Breakdown

ResourceConfigurationMonthly CostPriority
NAT Gateways2x Multi-AZ~$65⚠️ Critical
RDS PostgreSQLMulti-AZ, db.t3.micro~$35⚠️ High
ECS Fargate Tasks2 Nginx + 2 Flask + 1 Redis~$40⚠️ High
Application Load BalancerInternet-facing~$20⚠️ Medium
Elastic IPs2x Unattached~$7⚠️ Medium
CloudWatch Logs7-day retention~$2⚠️ Low
Data TransferALB to ECS~$5⚠️ Low
ECR StorageContainer images~$1⚠️ Low
TotalComplete Infrastructure~$175/month🚨 Urgent

Cost Optimization Priority

Immediate Action Required:

  1. NAT Gateways - Delete immediately (saves $65/month)
  2. RDS Database - Stop or delete (saves $35/month)
  3. ECS Services - Scale to zero (saves $40/month)
  4. Load Balancer - Delete when not needed (saves $20/month)

Production Cleanup Strategy

Critical Cleanup Order

⚠️ IMPORTANT: Follow this exact order to prevent dependency conflicts and ensure complete cleanup:

Phase 1: Application Layer (5-10 minutes)

  1. ECS Services - Scale down to 0 tasks
  2. Target Groups - Delete ALB target groups
  3. Application Load Balancer - Delete ALB
  4. ECS Task Definitions - Optional cleanup

Phase 2: Data Layer (10-15 minutes)

  1. RDS Database - Create final snapshot, then delete
  2. ECR Images - Optional cleanup (keep if needed)

Phase 3: Networking Layer (5-10 minutes)

  1. NAT Gateways - ⚠️ CRITICAL - Delete immediately (saves $65/month)
  2. Elastic IPs - Release unattached IPs
  3. VPC and Networking - Delete subnets, route tables, security groups

Phase 4: Infrastructure Layer (5-10 minutes)

  1. ECS Cluster - Delete cluster
  2. Service Discovery - Delete Cloud Map namespace
  3. CloudWatch Logs - Delete log groups
  4. IAM Roles - Delete custom roles and policies

Cleanup Time Estimates

  • Automated (Terraform): 15-20 minutes
  • Manual (AWS Console): 30-40 minutes
  • Scripted (Bash): 20-25 minutes

The safest and most efficient method for complete infrastructure cleanup.

Prerequisites

  • Terraform state file available
  • AWS credentials configured
  • All resources created with Terraform

Step 1: Navigate to Terraform Directory

cd terraform

Step 2: Review What Will Be Deleted

terraform plan -destroy

Review the output carefully. You should see all resources planned for destruction.

Before destroying, create a final snapshot:

aws rds create-db-snapshot \
  --db-instance-identifier ecs-microservices-postgres \
  --db-snapshot-identifier ecs-microservices-final-snapshot-$(date +%Y%m%d) \
  --region ap-south-1

# Wait for snapshot to complete
aws rds wait db-snapshot-completed \
  --db-snapshot-identifier ecs-microservices-final-snapshot-$(date +%Y%m%d) \
  --region ap-south-1

Step 4: Modify RDS to Skip Final Snapshot (if needed)

If you don’t want a final snapshot, update in Terraform:

Edit main.tf and change:

skip_final_snapshot = true

Then apply:

terraform apply -auto-approve

Step 5: Destroy All Resources

terraform destroy

Type yes when prompted.

This will take 15-20 minutes due to:

  • RDS deletion (~10 minutes)
  • NAT Gateway deletion (~5 minutes)
  • VPC cleanup

Step 6: Verify Cleanup

# Check ECS clusters
aws ecs list-clusters --region ap-south-1

# Check RDS instances
aws rds describe-db-instances --region ap-south-1

# Check VPCs (should only show default VPC)
aws ec2 describe-vpcs --region ap-south-1

# Check NAT Gateways
aws ec2 describe-nat-gateways --region ap-south-1 \
  --filter "Name=state,Values=available"

# Check ALBs
aws elbv2 describe-load-balancers --region ap-south-1

All should return empty or only default resources.

Method 2: Manual Cleanup via AWS Console

For situations where Terraform is not available or you need granular control over the cleanup process.

Prerequisites

  • AWS Console access
  • Administrator permissions
  • Understanding of resource dependencies

Step 1: Stop ECS Services

  1. Go to ECSClustersecs-microservices-cluster

  2. Click Services tab

  3. For each service (nginx, flask-app, redis):

    • Select the service
    • Click Update
    • Set Desired tasks to 0
    • Click Update
    • Wait for tasks to stop
  4. After all tasks stopped, delete services:

    • Select each service
    • Click Delete
    • Confirm deletion

⏱️ Time: ~5 minutes

Step 2: Delete Application Load Balancer

  1. Go to EC2Load Balancers
  2. Select ecs-microservices-alb
  3. ActionsDelete
  4. Type “confirm” and delete

⏱️ Time: ~2 minutes

Step 3: Delete Target Groups

  1. Go to EC2Target Groups
  2. Select ecs-microservices-nginx-tg
  3. ActionsDelete
  4. Confirm

⏱️ Time: ~1 minute

Step 4: Delete RDS Database

⚠️ Critical: Create final snapshot first if you need the data!

  1. Go to RDSDatabases
  2. Select ecs-microservices-postgres
  3. ActionsDelete
  4. Choose one:
    • Create final snapshot (recommended): Enter snapshot name
    • Skip final snapshot: Check the box (data will be lost)
  5. Type “delete me” to confirm
  6. Click Delete

⏱️ Time: ~10-15 minutes

Step 5: Delete NAT Gateways (Important - Saves $65/month!)

⚠️ This is the most expensive resource! Delete immediately to stop charges.

  1. Go to VPCNAT Gateways
  2. Select both NAT gateways (should be 2)
  3. ActionsDelete NAT gateway
  4. Type “delete” and confirm

⏱️ Time: ~5 minutes

Step 6: Release Elastic IPs

  1. Go to VPCElastic IPs
  2. Select the allocated IPs (should be 2)
  3. ActionsRelease Elastic IP addresses
  4. Confirm

⚠️ Note: You can only release EIPs after NAT Gateways are deleted.

⏱️ Time: ~1 minute

Step 7: Delete ECS Cluster

  1. Go to ECSClusters
  2. Select ecs-microservices-cluster
  3. Delete Cluster
  4. Confirm

⏱️ Time: ~2 minutes

Step 8: Delete Service Discovery Namespace

  1. Go to AWS Cloud MapNamespaces
  2. Select ecs-microservices.local
  3. Delete
  4. Confirm

⏱️ Time: ~1 minute

Step 9: Delete CloudWatch Log Groups

  1. Go to CloudWatchLogsLog groups
  2. Select these log groups:
    • /ecs/ecs-microservices/flask-app
    • /ecs/ecs-microservices/nginx
    • /ecs/ecs-microservices/redis
    • /ecs/ecs-microservices/exec
  3. ActionsDelete log group(s)
  4. Confirm

⏱️ Time: ~1 minute

Step 10: Delete ECR Repositories (Optional)

⚠️ Only delete if you don’t need the images anymore!

  1. Go to ECRRepositories
  2. Select repositories:
    • ecs-microservices/flask-app
    • ecs-microservices/nginx
  3. Delete
  4. Type “delete” and confirm

⏱️ Time: ~1 minute

Step 11: Delete VPC and Networking

Wait for NAT Gateways to finish deleting, then:

  1. Go to VPCYour VPCs
  2. Select ecs-microservices-vpc
  3. ActionsDelete VPC
  4. Confirm

This will delete:

  • All subnets
  • Route tables
  • Internet gateway
  • Security groups
  • VPC

⚠️ If deletion fails, manually delete in this order:

  1. Route table associations
  2. Subnets
  3. Route tables
  4. Internet gateway (detach first)
  5. Security groups (delete non-default)
  6. VPC

⏱️ Time: ~5 minutes

Step 12: Delete IAM Roles

  1. Go to IAMRoles
  2. Search for “ecs-microservices”
  3. Select these roles:
    • ecs-microservices-ecs-task-execution-role
    • ecs-microservices-ecs-task-role
  4. Delete
  5. Confirm

⏱️ Time: ~2 minutes

Step 13: Delete Auto Scaling Policies (if created manually)

  1. Go to Application Auto Scaling
  2. Select policies related to ECS services
  3. Delete them

⏱️ Time: ~1 minute

Method 3: Automated Cleanup Script

For automated cleanup without Terraform, use this production-ready script.

Prerequisites

  • AWS CLI configured
  • Bash shell available
  • Appropriate AWS permissions
#!/bin/bash

# Automated cleanup script
set -e

AWS_REGION="ap-south-1"
PROJECT_NAME="ecs-microservices"

echo "🧹 Starting cleanup process..."

# Step 1: Scale down ECS services to 0
echo "⏬ Scaling down ECS services..."
for service in nginx flask-app redis; do
    aws ecs update-service \
        --cluster ${PROJECT_NAME}-cluster \
        --service ${PROJECT_NAME}-${service} \
        --desired-count 0 \
        --region ${AWS_REGION} \
        > /dev/null 2>&1 || echo "Service ${service} not found or already deleted"
done

sleep 30

# Step 2: Delete ECS services
echo "🗑️ Deleting ECS services..."
for service in nginx flask-app redis; do
    aws ecs delete-service \
        --cluster ${PROJECT_NAME}-cluster \
        --service ${PROJECT_NAME}-${service} \
        --force \
        --region ${AWS_REGION} \
        > /dev/null 2>&1 || echo "Service ${service} not found or already deleted"
done

echo "⏳ Waiting for services to be deleted (60 seconds)..."
sleep 60

# Step 3: Use Terraform to destroy everything else
echo "🔥 Running terraform destroy..."
terraform destroy -auto-approve

echo "✅ Cleanup completed!"
echo "💰 Check AWS Console to verify all resources are deleted"

Make it executable and run:

chmod +x cleanup.sh
./cleanup.sh

Production Verification Checklist

Critical Resources Verification

⚠️ HIGH PRIORITY - Verify these are deleted to avoid charges:

  • NAT Gateways deleted ⚠️ Critical for cost savings ($65/month)
  • RDS Database deleted ⚠️ High priority ($35/month)
  • ECS Services stopped ⚠️ High priority ($40/month)
  • Application Load Balancer deleted ⚠️ Medium priority ($20/month)
  • Elastic IPs released ⚠️ Medium priority ($7/month)

Infrastructure Resources Verification

  • ECS Cluster deleted
  • Target Groups deleted
  • VPC and Networking deleted
  • Security Groups deleted (except default)
  • Service Discovery Namespace deleted
  • CloudWatch Log Groups deleted
  • IAM Roles deleted (custom roles only)

Optional Resources Verification

  • ECR Repositories deleted (if desired)
  • ECS Task Definitions deleted (optional)
  • Auto Scaling Policies deleted
  • Route53 Records deleted (if created)

Production Cost Verification

AWS Cost Explorer Verification

  1. Navigate to Cost Explorer:

    • Go to AWS Cost ManagementCost Explorer
    • Select Daily costs view
    • Filter by region: ap-south-1
    • Set date range to last 7 days
  2. Expected Results:

    • Before cleanup: $5-6/day ($175/month)
    • After cleanup: $0.10-0.50/day ($3-15/month)
    • Target: Near $0 within 24-48 hours

Cost Monitoring Commands

# Check current costs
aws ce get-cost-and-usage \
  --time-period Start=2024-01-01,End=2024-01-31 \
  --granularity MONTHLY \
  --metrics BlendedCost \
  --region us-east-1

# Check for running resources
aws ec2 describe-instances --region ap-south-1 --query 'Reservations[*].Instances[*].[InstanceId,State.Name]'
aws rds describe-db-instances --region ap-south-1 --query 'DBInstances[*].[DBInstanceIdentifier,DBInstanceStatus]'
aws ecs list-clusters --region ap-south-1

Production Troubleshooting

Issue: VPC Won’t Delete

Error: “The vpc has dependencies and cannot be deleted”

Root Cause: Dependencies still exist in the VPC

Solution: Delete resources in this exact order:

  1. NAT Gateways (wait until fully deleted - 5-10 minutes)
  2. Elastic IPs (only after NAT Gateways are deleted)
  3. Network interfaces (ENIs from ECS tasks)
  4. Subnets (all custom subnets)
  5. Internet Gateway (detach first, then delete)
  6. Route tables (custom route tables)
  7. Security groups (non-default groups)
  8. Finally, VPC

Issue: NAT Gateway Stuck in “Deleting” State

Symptoms: NAT Gateway shows “deleting” for more than 10 minutes

Solution:

  • This is normal behavior
  • Can take 5-15 minutes depending on AWS load
  • Do not delete again - wait for completion
  • Check CloudWatch for any error logs

Issue: Security Group Deletion Fails

Error: “has dependent objects”

Root Cause: Network interfaces or instances still using the security group

Solution:

  1. Delete ECS services first (removes ENIs)
  2. Wait 5-10 minutes for ENIs to be removed
  3. Check for orphaned ENIs in EC2 console
  4. Delete ENIs manually if they exist
  5. Try security group deletion again

Issue: RDS Deletion Hangs

Symptoms: RDS instance stuck in “deleting” state

Root Cause: Final snapshot creation or Multi-AZ cleanup

Solution:

  • Normal behavior for Multi-AZ instances (10-15 minutes)
  • Monitor RDS console for progress
  • Check CloudWatch logs for any errors
  • Do not interrupt the deletion process

Issue: Terraform Destroy Fails

Common Causes: State drift, resource dependencies, permission issues

Solution:

  1. Refresh state: terraform refresh
  2. Retry destroy: terraform destroy
  3. Manual cleanup: Delete failed resources manually
  4. Remove from state: terraform state rm <resource>
  5. Retry destroy: terraform destroy again

Issue: ECS Services Won’t Scale Down

Symptoms: Services remain at desired count despite update

Solution:

  1. Force new deployment: --force-new-deployment
  2. Check task health: Ensure tasks are healthy
  3. Wait for stability: Allow 5-10 minutes
  4. Scale down gradually: Reduce desired count in steps

Cost Optimization Strategies

Strategy 1: Partial Cleanup (Keep Infrastructure)

For temporary cost reduction while preserving infrastructure:

Quick Cost Reduction (~$100/month savings)

1. Stop ECS Services (saves ~$40/month):

# Scale down all services to 0
aws ecs update-service --cluster ecs-microservices-cluster \
  --service ecs-microservices-flask-app --desired-count 0 --region ap-south-1
aws ecs update-service --cluster ecs-microservices-cluster \
  --service ecs-microservices-nginx --desired-count 0 --region ap-south-1
aws ecs update-service --cluster ecs-microservices-cluster \
  --service ecs-microservices-redis --desired-count 0 --region ap-south-1

2. Stop RDS Database (saves ~$35/month):

aws rds stop-db-instance \
  --db-instance-identifier ecs-microservices-postgres \
  --region ap-south-1

⚠️ Note: RDS auto-starts after 7 days

3. Delete NAT Gateways (saves ~$65/month):

# Delete NAT Gateways (requires Terraform apply to recreate)
aws ec2 delete-nat-gateway --nat-gateway-id <nat-gateway-id> --region ap-south-1

Strategy 2: Development Environment Optimization

For long-term development use:

1. Use Single-AZ RDS (saves ~$17/month):

  • Change from Multi-AZ to Single-AZ
  • Accept reduced availability for dev environment

2. Use Single NAT Gateway (saves ~$32/month):

  • Delete one NAT Gateway
  • Accept reduced availability for dev environment

3. Reduce ECS Task Counts (saves ~$20/month):

  • Use 1 task per service instead of 2
  • Accept reduced capacity for dev environment

Restart After Partial Cleanup:

# Start RDS
aws rds start-db-instance \
  --db-instance-identifier ecs-microservices-postgres \
  --region ap-south-1

# Scale up ECS services
aws ecs update-service --cluster ecs-microservices-cluster \
  --service ecs-microservices-redis --desired-count 1 --region ap-south-1
aws ecs update-service --cluster ecs-microservices-cluster \
  --service ecs-microservices-flask-app --desired-count 2 --region ap-south-1
aws ecs update-service --cluster ecs-microservices-cluster \
  --service ecs-microservices-nginx --desired-count 2 --region ap-south-1

Production Best Practices

Critical Cleanup Reminders

⚠️ HIGH PRIORITY - These actions save the most money:

  1. NAT Gateways - Delete immediately (saves $65/month)
  2. RDS Database - Stop or delete (saves $35/month)
  3. ECS Services - Scale to zero (saves $40/month)
  4. Load Balancer - Delete when not needed (saves $20/month)
  5. Elastic IPs - Release unattached IPs (saves $7/month)

Data Preservation Best Practices

Before Deletion:

  • Create RDS final snapshot if you need the data
  • Export ECR images if you want to keep them
  • Save Terraform state for future recreation
  • Document configurations for reference

Cost Monitoring Best Practices

After Cleanup:

  • Check Cost Explorer within 24-48 hours
  • Set up AWS Budgets for future cost alerts
  • Monitor for orphaned resources weekly
  • Review costs monthly to catch any surprises

Cleanup Time Estimates

MethodTime RequiredComplexityRecommended For
Terraform Destroy15-20 minutesLowProduction use
Manual Console30-40 minutesMediumLearning/understanding
Automated Script20-25 minutesLowRepetitive cleanup

What to Keep for Future Use

Low-Cost Resources to Preserve

  • ECR Repositories (~$0.10/GB-month) - Keep container images
  • S3 Bucket (~$0.023/GB-month) - Terraform state storage
  • IAM Users and Policies (no cost) - Access management
  • RDS Snapshots (~$0.095/GB-month) - Database backups
  • Route53 Hosted Zones ($0.50/month) - DNS management

High-Cost Resources to Delete

  • NAT Gateways ($65/month) - Delete immediately
  • RDS Multi-AZ ($35/month) - Delete or use Single-AZ
  • ECS Fargate Tasks ($40/month) - Scale to zero
  • Application Load Balancer ($20/month) - Delete when not needed

Final Production Checklist

Pre-Cleanup Verification

  • Data backed up (RDS snapshots, ECR images)
  • Terraform state saved (if using Terraform)
  • Documentation updated (configurations, lessons learned)
  • Team notified (if shared infrastructure)

Post-Cleanup Verification

  • AWS Console verified (all resources deleted)
  • Cost Explorer checked (costs near $0)
  • Billing alerts set (prevent future surprises)
  • Cleanup documented (for future reference)

Series Completion

🎉 Congratulations! You’ve successfully completed the entire production-grade ECS microservices series:

  • Part 1: Infrastructure setup with Terraform
  • Part 2: Application containerization with Docker
  • Part 3: ECS deployment with auto-scaling
  • Part 4: Production deployment and monitoring
  • Part 5: CI/CD pipeline with GitHub Actions
  • Part 6: Production cleanup and cost optimization

Key Takeaways

Cost Management

  1. NAT Gateways are the most expensive resource - delete first
  2. RDS Multi-AZ doubles database costs - use Single-AZ for dev
  3. ECS Fargate charges by task count - scale to zero when not needed
  4. Load Balancers charge hourly - delete when not in use
  5. Elastic IPs charge when unattached - release them

Cleanup Best Practices

  1. Follow the exact order to prevent dependency conflicts
  2. Create snapshots before deleting databases
  3. Verify cleanup in AWS Console and Cost Explorer
  4. Set up monitoring to prevent future cost surprises
  5. Document everything for future reference

Production Readiness

  1. Automated cleanup with Terraform is most reliable
  2. Manual cleanup provides better understanding
  3. Scripted cleanup balances automation and control
  4. Cost monitoring prevents unexpected charges
  5. Data preservation ensures no data loss

Remember: AWS bills are prorated, so the sooner you delete resources, the less you pay!

Questions or feedback? Feel free to reach out in the comments below!

Table of Contents