Building Production-Grade ECS Microservices with CI/CD - Part 5: CI/CD Pipeline

Complete guide to implementing production-ready CI/CD pipelines for ECS microservices using GitHub Actions, covering automated builds, deployments, testing, and rollback strategies.

Building Production-Grade ECS Microservices with CI/CD - Part 5: CI/CD Pipeline

Table of Contents

Building Production-Grade ECS Microservices with CI/CD - Part 5: CI/CD Pipeline

Welcome to the final part of our comprehensive series on building production-grade microservices on AWS ECS. In this installment, we’ll implement a complete CI/CD pipeline using GitHub Actions that automates the entire deployment process, from code commit to production deployment.

What We’ll Build

In this final phase, we’ll create a production-ready CI/CD pipeline that automatically:

  1. Builds Docker Images - Automated container image creation with versioning
  2. Pushes to ECR - Secure image storage in Amazon Elastic Container Registry
  3. Updates ECS Services - Automated task definition updates and service deployment
  4. Validates Deployments - Comprehensive testing and health checks
  5. Enables Rollbacks - Quick rollback capabilities for failed deployments
  6. Supports Multi-Environment - Dev, staging, and production environments

CI/CD Pipeline Architecture

Our production-ready CI/CD pipeline follows modern DevOps practices:

┌─────────────────────────────────────────────────────────────────┐
│                    Developer Workflow                          │
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────┐        │
│  │   Code      │    │   Pull      │    │   Merge     │        │
│  │   Commit    │───▶│   Request   │───▶│   to Main   │        │
│  └─────────────┘    └─────────────┘    └─────────────┘        │
└────────────────────────┬────────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────────┐
│                  GitHub Actions Pipeline                       │
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────┐        │
│  │   Code      │    │   Build     │    │   Test      │        │
│  │   Checkout  │───▶│   Images    │───▶│   & Lint    │        │
│  └─────────────┘    └─────────────┘    └─────────────┘        │
│                                                               │
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────┐        │
│  │   Push to    │    │   Update   │    │   Deploy    │        │
│  │   ECR        │───▶│   Task     │───▶│   to ECS    │        │
│  │              │    │   Defs     │    │            │        │
│  └─────────────┘    └─────────────┘    └─────────────┘        │
└────────────────────────┬────────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────────┐
│                    AWS Infrastructure                          │
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────┐        │
│  │   ECR       │    │   ECS       │    │   ALB       │        │
│  │   Images    │───▶│   Services  │───▶│   Health    │        │
│  │   Stored    │    │   Updated   │    │   Checks    │        │
│  └─────────────┘    └─────────────┘    └─────────────┘        │
└─────────────────────────────────────────────────────────────────┘

Prerequisites

Before we begin, ensure you have completed the previous phases:

Infrastructure Requirements

  • Part 1 Complete: VPC, subnets, security groups, RDS database, ECR repositories
  • Part 2 Complete: ECS cluster, IAM roles, CloudWatch log groups
  • Part 3 Complete: Containerized applications tested locally
  • Part 4 Complete: ECS services deployed and running

GitHub Requirements

  • GitHub Repository: Code repository with proper structure
  • GitHub Actions: Enabled for the repository
  • Branch Protection: Optional but recommended for production

AWS Permissions

Ensure your AWS credentials have the following permissions:

  • ECR: ecr:GetAuthorizationToken, ecr:PutImage, ecr:BatchGetImage
  • ECS: ecs:DescribeServices, ecs:UpdateService, ecs:RegisterTaskDefinition
  • IAM: iam:PassRole (for ECS task execution)
  • ELB: elasticloadbalancing:DescribeLoadBalancers (for health checks)

Step-by-Step Implementation

Let’s build our production-ready CI/CD pipeline step by step.

Step 1: Set Up GitHub Repository

First, we’ll create a proper Git repository structure for our project.

# Initialize git in your project
cd ecs-cicd-project
git init

# Create .gitignore
cat > .gitignore << 'EOF'
# Terraform
*.tfstate
*.tfstate.*
*.tfstate.backup
.terraform/
.terraform.lock.hcl
terraform.tfvars

# IDE
.vscode/
.idea/
*.swp
*.swo

# OS
.DS_Store
Thumbs.db

# Python
__pycache__/
*.py[cod]
*.so
.Python
venv/
ENV/

# Docker
.dockerignore
EOF

# Add all files
git add .
git commit -m "Initial commit: ECS microservices project"

# Create repository on GitHub and push
git remote add origin https://github.com/YOUR_USERNAME/ecs-cicd-project.git
git branch -M main
git push -u origin main

Step 2: Configure GitHub Secrets

We’ll set up secure credentials for GitHub Actions to access AWS services.

2.1: Create IAM User for GitHub Actions

For production use, create a dedicated IAM user with minimal required permissions:

Create an IAM user with the following policy (or use your existing credentials):

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "ecr:GetAuthorizationToken",
        "ecr:BatchCheckLayerAvailability",
        "ecr:GetDownloadUrlForLayer",
        "ecr:BatchGetImage",
        "ecr:PutImage",
        "ecr:InitiateLayerUpload",
        "ecr:UploadLayerPart",
        "ecr:CompleteLayerUpload",
        "ecs:DescribeServices",
        "ecs:DescribeTaskDefinition",
        "ecs:DescribeTasks",
        "ecs:ListTasks",
        "ecs:RegisterTaskDefinition",
        "ecs:UpdateService",
        "iam:PassRole"
      ],
      "Resource": "*"
    }
  ]
}

2.2: Add Secrets to GitHub

Go to your repository on GitHub:

  1. Click Settings
  2. Click Secrets and variablesActions
  3. Click New repository secret

Add these secrets:

Secret NameValueDescription
AWS_ACCESS_KEY_IDYour AWS access keyIAM user access key
AWS_SECRET_ACCESS_KEYYour AWS secret keyIAM user secret key
AWS_REGIONap-south-1AWS region
AWS_ACCOUNT_IDYour AWS account ID12-digit account ID
ECR_FLASK_REPOSITORYecs-microservices/flask-appFlask ECR repo name
ECR_NGINX_REPOSITORYecs-microservices/nginxNginx ECR repo name
ECS_CLUSTER_NAMEecs-microservices-clusterECS cluster name
ECS_SERVICE_FLASKecs-microservices-flask-appFlask service name
ECS_SERVICE_NGINXecs-microservices-nginxNginx service name

Step 3: Implement GitHub Actions Workflows

We’ll create comprehensive GitHub Actions workflows for automated CI/CD.

3.1: Set Up Workflow Directory

mkdir -p .github/workflows

3.2: Create Production Deployment Workflow

Our main deployment workflow handles the complete CI/CD process:

Create .github/workflows/deploy.yml:

name: Deploy to ECS

on:
  push:
    branches:
      - main
      - develop
  workflow_dispatch:

env:
  AWS_REGION: ${{ secrets.AWS_REGION }}
  ECR_FLASK_REPOSITORY: ${{ secrets.ECR_FLASK_REPOSITORY }}
  ECR_NGINX_REPOSITORY: ${{ secrets.ECR_NGINX_REPOSITORY }}
  ECS_CLUSTER: ${{ secrets.ECS_CLUSTER_NAME }}
  ECS_SERVICE_FLASK: ${{ secrets.ECS_SERVICE_FLASK }}
  ECS_SERVICE_NGINX: ${{ secrets.ECS_SERVICE_NGINX }}

jobs:
  build-and-deploy:
    name: Build and Deploy
    runs-on: ubuntu-latest
    environment: production

    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
          aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          aws-region: ${{ env.AWS_REGION }}

      - name: Login to Amazon ECR
        id: login-ecr
        uses: aws-actions/amazon-ecr-login@v2

      - name: Set image tag
        id: image-tag
        run: |
          # Use git commit SHA as image tag
          echo "IMAGE_TAG=${GITHUB_SHA::7}" >> $GITHUB_OUTPUT
          echo "Image tag: ${GITHUB_SHA::7}"          

      - name: Build and push Flask image
        id: build-flask
        env:
          ECR_REGISTRY: ${{ steps.login-ecr.outputs.registry }}
          IMAGE_TAG: ${{ steps.image-tag.outputs.IMAGE_TAG }}
        run: |
          cd application/flask-app
          docker build -t $ECR_REGISTRY/$ECR_FLASK_REPOSITORY:$IMAGE_TAG .
          docker tag $ECR_REGISTRY/$ECR_FLASK_REPOSITORY:$IMAGE_TAG $ECR_REGISTRY/$ECR_FLASK_REPOSITORY:latest
          docker push $ECR_REGISTRY/$ECR_FLASK_REPOSITORY:$IMAGE_TAG
          docker push $ECR_REGISTRY/$ECR_FLASK_REPOSITORY:latest
          echo "image=$ECR_REGISTRY/$ECR_FLASK_REPOSITORY:$IMAGE_TAG" >> $GITHUB_OUTPUT          

      - name: Build and push Nginx image
        id: build-nginx
        env:
          ECR_REGISTRY: ${{ steps.login-ecr.outputs.registry }}
          IMAGE_TAG: ${{ steps.image-tag.outputs.IMAGE_TAG }}
        run: |
          cd application/nginx
          docker build -t $ECR_REGISTRY/$ECR_NGINX_REPOSITORY:$IMAGE_TAG .
          docker tag $ECR_REGISTRY/$ECR_NGINX_REPOSITORY:$IMAGE_TAG $ECR_REGISTRY/$ECR_NGINX_REPOSITORY:latest
          docker push $ECR_REGISTRY/$ECR_NGINX_REPOSITORY:$IMAGE_TAG
          docker push $ECR_REGISTRY/$ECR_NGINX_REPOSITORY:latest
          echo "image=$ECR_REGISTRY/$ECR_NGINX_REPOSITORY:$IMAGE_TAG" >> $GITHUB_OUTPUT          

      - name: Download Flask task definition
        run: |
          aws ecs describe-task-definition \
            --task-definition ecs-microservices-flask-app \
            --query taskDefinition > flask-task-definition.json          

      - name: Update Flask task definition
        id: flask-task-def
        uses: aws-actions/amazon-ecs-render-task-definition@v1
        with:
          task-definition: flask-task-definition.json
          container-name: flask-app
          image: ${{ steps.build-flask.outputs.image }}

      - name: Deploy Flask to ECS
        uses: aws-actions/amazon-ecs-deploy-task-definition@v1
        with:
          task-definition: ${{ steps.flask-task-def.outputs.task-definition }}
          service: ${{ env.ECS_SERVICE_FLASK }}
          cluster: ${{ env.ECS_CLUSTER }}
          wait-for-service-stability: true

      - name: Download Nginx task definition
        run: |
          aws ecs describe-task-definition \
            --task-definition ecs-microservices-nginx \
            --query taskDefinition > nginx-task-definition.json          

      - name: Update Nginx task definition
        id: nginx-task-def
        uses: aws-actions/amazon-ecs-render-task-definition@v1
        with:
          task-definition: nginx-task-definition.json
          container-name: nginx
          image: ${{ steps.build-nginx.outputs.image }}

      - name: Deploy Nginx to ECS
        uses: aws-actions/amazon-ecs-deploy-task-definition@v1
        with:
          task-definition: ${{ steps.nginx-task-def.outputs.task-definition }}
          service: ${{ env.ECS_SERVICE_NGINX }}
          cluster: ${{ env.ECS_CLUSTER }}
          wait-for-service-stability: true

      - name: Verify deployment
        run: |
          echo "✅ Deployment completed successfully!"
          echo "Flask image: ${{ steps.build-flask.outputs.image }}"
          echo "Nginx image: ${{ steps.build-nginx.outputs.image }}"

          # Get ALB DNS
          ALB_DNS=$(aws elbv2 describe-load-balancers \
            --names ecs-microservices-alb \
            --query 'LoadBalancers[0].DNSName' \
            --output text)

          echo "Application URL: http://$ALB_DNS"          

      - name: Deployment summary
        run: |
          echo "### Deployment Summary :rocket:" >> $GITHUB_STEP_SUMMARY
          echo "" >> $GITHUB_STEP_SUMMARY
          echo "- **Commit**: ${{ github.sha }}" >> $GITHUB_STEP_SUMMARY
          echo "- **Branch**: ${{ github.ref_name }}" >> $GITHUB_STEP_SUMMARY
          echo "- **Image Tag**: ${{ steps.image-tag.outputs.IMAGE_TAG }}" >> $GITHUB_STEP_SUMMARY
          echo "- **Flask Image**: ${{ steps.build-flask.outputs.image }}" >> $GITHUB_STEP_SUMMARY
          echo "- **Nginx Image**: ${{ steps.build-nginx.outputs.image }}" >> $GITHUB_STEP_SUMMARY          

3.3: Create Infrastructure Validation Workflow

We’ll add Terraform validation to ensure infrastructure changes are valid:

Create .github/workflows/terraform.yml:

name: Terraform Validation

on:
  pull_request:
    paths:
      - "terraform/**"
  push:
    branches:
      - main
    paths:
      - "terraform/**"

jobs:
  terraform:
    name: Terraform Validation
    runs-on: ubuntu-latest

    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: 1.6.0

      - name: Terraform Format Check
        id: fmt
        run: terraform fmt -check -recursive
        working-directory: terraform
        continue-on-error: true

      - name: Terraform Init
        id: init
        run: terraform init -backend=false
        working-directory: terraform

      - name: Terraform Validate
        id: validate
        run: terraform validate
        working-directory: terraform

      - name: Comment PR
        if: github.event_name == 'pull_request'
        uses: actions/github-script@v7
        with:
          script: |
            const output = `#### Terraform Format and Style 🖌\`${{ steps.fmt.outcome }}\`
            #### Terraform Initialization ⚙️\`${{ steps.init.outcome }}\`
            #### Terraform Validation 🤖\`${{ steps.validate.outcome }}\`

            *Pushed by: @${{ github.actor }}, Action: \`${{ github.event_name }}\`*`;

            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: output
            })            

3.4: Create Pull Request Quality Checks

We’ll implement comprehensive quality checks for pull requests:

Create .github/workflows/pr-checks.yml:

name: PR Checks

on:
  pull_request:
    branches:
      - main

jobs:
  lint-and-test:
    name: Lint and Test
    runs-on: ubuntu-latest

    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.11"

      - name: Install dependencies
        run: |
          cd application/flask-app
          pip install -r requirements.txt
          pip install flake8 pytest          

      - name: Lint with flake8
        run: |
          cd application/flask-app
          # Stop the build if there are Python syntax errors or undefined names
          flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
          # Exit-zero treats all errors as warnings
          flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics          
        continue-on-error: true

      - name: Test Docker builds
        run: |
          cd application
          docker-compose build
          echo "✅ Docker images built successfully"          

Step 4: Create Production Deployment Scripts

We’ll create helper scripts for manual deployments and rollbacks.

4.1: Create Manual Deployment Script

Create scripts/deploy.sh:

#!/bin/bash

# Manual deployment script for ECS services
# Usage: ./deploy.sh [IMAGE_TAG]

set -e

# Configuration
AWS_REGION="ap-south-1"
ECS_CLUSTER="ecs-microservices-cluster"
FLASK_SERVICE="ecs-microservices-flask-app"
NGINX_SERVICE="ecs-microservices-nginx"
AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)

# Get image tag
IMAGE_TAG="${1:-latest}"

echo "🚀 Starting deployment with image tag: $IMAGE_TAG"

# ECR URLs
FLASK_IMAGE="$AWS_ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com/ecs-microservices/flask-app:$IMAGE_TAG"
NGINX_IMAGE="$AWS_ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com/ecs-microservices/nginx:$IMAGE_TAG"

echo "Flask image: $FLASK_IMAGE"
echo "Nginx image: $NGINX_IMAGE"

# Function to update service
update_service() {
    local service_name=$1
    local task_family=$2
    local container_name=$3
    local image=$4

    echo "📦 Updating $service_name..."

    # Get current task definition
    TASK_DEF=$(aws ecs describe-task-definition \
        --task-definition "$task_family" \
        --region "$AWS_REGION" \
        --query 'taskDefinition')

    # Create new task definition with updated image
    NEW_TASK_DEF=$(echo "$TASK_DEF" | jq --arg IMAGE "$image" --arg CONTAINER "$container_name" '
        .containerDefinitions |= map(
            if .name == $CONTAINER then
                .image = $IMAGE
            else
                .
            end
        ) |
        {
            family: .family,
            networkMode: .networkMode,
            taskRoleArn: .taskRoleArn,
            executionRoleArn: .executionRoleArn,
            containerDefinitions: .containerDefinitions,
            requiresCompatibilities: .requiresCompatibilities,
            cpu: .cpu,
            memory: .memory
        }
    ')

    # Register new task definition
    NEW_TASK_ARN=$(aws ecs register-task-definition \
        --region "$AWS_REGION" \
        --cli-input-json "$NEW_TASK_DEF" \
        --query 'taskDefinition.taskDefinitionArn' \
        --output text)

    echo "New task definition: $NEW_TASK_ARN"

    # Update service
    aws ecs update-service \
        --cluster "$ECS_CLUSTER" \
        --service "$service_name" \
        --task-definition "$NEW_TASK_ARN" \
        --region "$AWS_REGION" \
        --force-new-deployment \
        > /dev/null

    echo "✅ $service_name updated successfully"
}

# Update Flask service
update_service "$FLASK_SERVICE" "ecs-microservices-flask-app" "flask-app" "$FLASK_IMAGE"

# Update Nginx service
update_service "$NGINX_SERVICE" "ecs-microservices-nginx" "nginx" "$NGINX_IMAGE"

# Wait for services to stabilize
echo "⏳ Waiting for services to stabilize..."
aws ecs wait services-stable \
    --cluster "$ECS_CLUSTER" \
    --services "$FLASK_SERVICE" "$NGINX_SERVICE" \
    --region "$AWS_REGION"

echo "✅ Deployment completed successfully!"

# Get ALB DNS
ALB_DNS=$(aws elbv2 describe-load-balancers \
    --names ecs-microservices-alb \
    --region "$AWS_REGION" \
    --query 'LoadBalancers[0].DNSName' \
    --output text)

echo "🌐 Application URL: http://$ALB_DNS"

Make it executable:

chmod +x scripts/deploy.sh

4.2: Create Production Rollback Script

For quick recovery from failed deployments:

#!/bin/bash

# Rollback script for ECS services
# Usage: ./rollback.sh

set -e

AWS_REGION="ap-south-1"
ECS_CLUSTER="ecs-microservices-cluster"

echo "🔄 Rolling back ECS services..."

# Function to rollback service
rollback_service() {
    local service_name=$1

    echo "Rolling back $service_name..."

    # Get current task definition
    CURRENT_TASK=$(aws ecs describe-services \
        --cluster "$ECS_CLUSTER" \
        --services "$service_name" \
        --region "$AWS_REGION" \
        --query 'services[0].taskDefinition' \
        --output text)

    # Extract task family and revision
    TASK_FAMILY=$(echo "$CURRENT_TASK" | cut -d':' -f6 | cut -d'/' -f2)
    CURRENT_REVISION=$(echo "$CURRENT_TASK" | cut -d':' -f7)
    PREVIOUS_REVISION=$((CURRENT_REVISION - 1))

    if [ "$PREVIOUS_REVISION" -lt 1 ]; then
        echo "❌ No previous revision to rollback to"
        return 1
    fi

    PREVIOUS_TASK="$TASK_FAMILY:$PREVIOUS_REVISION"

    echo "Rolling back from $CURRENT_TASK to $PREVIOUS_TASK"

    # Update service with previous task definition
    aws ecs update-service \
        --cluster "$ECS_CLUSTER" \
        --service "$service_name" \
        --task-definition "$PREVIOUS_TASK" \
        --region "$AWS_REGION" \
        --force-new-deployment \
        > /dev/null

    echo "✅ $service_name rolled back"
}

# Rollback services
rollback_service "ecs-microservices-flask-app"
rollback_service "ecs-microservices-nginx"

# Wait for stability
echo "⏳ Waiting for services to stabilize..."
aws ecs wait services-stable \
    --cluster "$ECS_CLUSTER" \
    --services ecs-microservices-flask-app ecs-microservices-nginx \
    --region "$AWS_REGION"

echo "✅ Rollback completed successfully!"

Make it executable:

chmod +x scripts/rollback.sh

Step 5: Test and Validate the CI/CD Pipeline

Let’s thoroughly test our automated deployment pipeline.

5.1: Create a Test Change

Edit application/flask-app/app.py and change the version:

'version': '1.0.1',  # Changed from 1.0.0

5.2: Commit and Push

git add application/flask-app/app.py
git commit -m "Update Flask app version to 1.0.1"
git push origin main

5.3: Monitor Pipeline Execution

  1. Go to your GitHub repository
  2. Click Actions tab
  3. Watch the deployment workflow execute
  4. It should take ~10-15 minutes

5.4: Validate Production Deployment

Once complete, test your application:

# Get ALB DNS from GitHub Actions output or:
ALB_DNS=$(terraform output -raw alb_dns_name)

# Test the updated version
curl http://$ALB_DNS/ | jq '.version'
# Should show: "1.0.1"

Step 6: Configure Production Security

6.1: Set Up Branch Protection Rules

For production environments, configure branch protection:

  1. Go to repository SettingsBranches
  2. Add branch protection rule for main:
    • ✅ Require a pull request before merging
    • ✅ Require status checks to pass
    • ✅ Require branches to be up to date

6.2: Configure Multi-Environment Support

For production use, set up multiple environments:

Environment Strategy:

  • Development: develop branch → dev environment
  • Staging: staging branch → staging environment
  • Production: main branch → production environment

Create .github/workflows/deploy-staging.yml:

name: Deploy to Staging

on:
  push:
    branches:
      - develop

env:
  AWS_REGION: ap-south-1
  ENVIRONMENT: staging
# Similar to deploy.yml but with staging-specific configuration

Production Usage Guide

Automated Deployment Workflow

Trigger: Every push to main branch automatically triggers deployment

git push origin main

Process:

  1. Code is validated and tested
  2. Docker images are built and pushed to ECR
  3. ECS task definitions are updated
  4. Services are deployed with zero downtime
  5. Health checks verify deployment success

Manual Deployment Options

Option 1: GitHub Actions UI

  1. Go to ActionsDeploy to ECS
  2. Click Run workflow
  3. Select branch and click Run workflow

Option 2: Local Script Deployment

# Build and push images locally
cd application
./build-and-push.sh v1.0.2

# Deploy using the script
cd ../scripts
./deploy.sh v1.0.2
./deploy.sh v1.0.2

Production Rollback Procedures

cd scripts
./rollback.sh

Manual Rollback via AWS Console

  1. Go to ECS → Clusters → Services
  2. Select service → Update
  3. Select previous task definition revision
  4. Force new deployment

Emergency Rollback

  • Use AWS CLI to immediately rollback to previous version
  • Monitor CloudWatch logs for issues
  • Verify application health after rollback

Production Troubleshooting

Issue: GitHub Actions Workflow Fails at ECR Login

Symptoms: Workflow fails with “Unable to locate credentials” error

Solutions:

  1. Check AWS Credentials: Verify AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY in GitHub secrets
  2. Verify IAM Permissions: Ensure IAM user has ECR permissions
  3. Check Region: Verify AWS_REGION secret is correct
  4. Test Credentials: Use AWS CLI to test credentials locally

Issue: Task Definition Update Fails

Symptoms: ECS service update fails with permission errors

Solutions:

  1. IAM Permissions: Ensure IAM user has ecs:RegisterTaskDefinition and iam:PassRole permissions
  2. Task Role: Verify task execution role has necessary permissions
  3. Resource Limits: Check CPU/memory limits in task definition
  4. Image Availability: Ensure new image exists in ECR

Issue: Service Update Fails

Symptoms: ECS service remains in “pending” state

Solutions:

  1. Check Service Events: Review ECS console for error messages
  2. Image Verification: Ensure new image is accessible and valid
  3. Resource Constraints: Verify sufficient resources in cluster
  4. Health Checks: Ensure health check endpoints are responding
  5. Security Groups: Verify security group rules allow traffic

Issue: Deployment Times Out

Symptoms: Workflow times out waiting for service stability

Solutions:

  1. Health Check Configuration: Verify health check paths and intervals
  2. Security Groups: Ensure ALB can reach ECS tasks
  3. Network Connectivity: Check NAT Gateway and internet access
  4. Resource Limits: Increase CPU/memory if needed
  5. Timeout Settings: Adjust workflow timeout values

Issue: Rollback Fails

Symptoms: Rollback script fails to revert to previous version

Solutions:

  1. Task Definition History: Check if previous revision exists
  2. Service State: Ensure service is in stable state before rollback
  3. Permissions: Verify IAM user has rollback permissions
  4. Manual Rollback: Use AWS Console for immediate rollback

Production Best Practices

Code Management

  1. Use Git Tags for Releases

    git tag -a v1.0.0 -m "Release version 1.0.0"
    git push origin v1.0.0
    
  2. Always Test Locally First

    docker-compose up
    # Run tests locally before pushing
    
  3. Use Pull Requests for Code Review

    • Never push directly to main in production
    • Require PR reviews from team members
    • Run automated checks on all PRs
    • Use branch protection rules

Security Best Practices

  1. Keep Secrets Secure

    • Never commit terraform.tfvars or credentials
    • Use GitHub Secrets for all sensitive data
    • Rotate AWS credentials regularly
    • Use least privilege IAM policies
  2. Monitor Deployments

    # Monitor CloudWatch logs during deployment
    aws logs tail /ecs/ecs-microservices/flask-app --follow
    

Operational Excellence

  1. Implement Blue-Green Deployments

    • Use ECS service updates for zero-downtime deployments
    • Test new versions in staging before production
    • Implement canary deployments for critical updates
  2. Automated Testing

    • Run unit tests in CI pipeline
    • Implement integration tests
    • Use security scanning for container images
    • Validate infrastructure changes with Terraform
  3. Monitoring and Alerting

    • Set up CloudWatch alarms for service health
    • Monitor deployment success rates
    • Implement log aggregation and analysis
    • Use AWS X-Ray for distributed tracing

Next Steps

Part 5 Complete! Your production CI/CD pipeline now includes:

  • Fully Automated CI/CD with GitHub Actions
  • Automated Builds and Deployments with zero downtime
  • Comprehensive Testing and quality checks
  • Production Rollback capabilities
  • Multi-Environment Support for dev/staging/prod
  • Security Best Practices with secrets management
  • Monitoring and Alerting for operational excellence

Proceed to CLEANUP.md when you’re ready to tear down resources and avoid charges.

Key Takeaways

CI/CD Pipeline Benefits

  1. Automated Deployment - GitHub Actions automates the entire deployment process
  2. Version Control - ECR stores versioned container images with proper tagging
  3. Zero Downtime - ECS task definitions enable seamless service updates
  4. Quick Rollbacks - Easy rollback to previous versions when issues occur
  5. Quality Assurance - Automated testing catches issues before production

Production Readiness

  1. Security - Secure credential management with GitHub Secrets
  2. Reliability - Comprehensive error handling and rollback procedures
  3. Scalability - Multi-environment support for different deployment stages
  4. Monitoring - CloudWatch integration for deployment visibility
  5. Best Practices - Industry-standard DevOps practices implemented

Operational Excellence

  1. Automation - Complete CI/CD pipeline reduces manual errors
  2. Testing - Automated quality checks ensure code reliability
  3. Deployment - Blue-green deployments minimize service disruption
  4. Recovery - Quick rollback capabilities for failed deployments
  5. Monitoring - Comprehensive logging and alerting for operations

Series Completion

🎉 Congratulations! You’ve successfully built a complete production-grade microservices architecture on AWS ECS with:

  • Part 1: Production-ready infrastructure with Terraform
  • Part 2: Containerized applications with Docker best practices
  • Part 3: ECS deployment with auto-scaling and load balancing
  • Part 4: Complete CI/CD pipeline with GitHub Actions
  • Part 5: Production monitoring, security, and operational excellence

This architecture provides a robust, scalable, and secure foundation for running microservices in production on AWS.


Ready for production? Your CI/CD pipeline is now ready to handle continuous deployments with confidence! Here is the Part 6, where we’ll clean up the source environment and temporary infrastructure!

Questions or feedback? Feel free to reach out in the comments below!

Table of Contents