Building Production-Grade ECS Microservices with CI/CD - Part 5: CI/CD Pipeline

Complete guide to implementing production-ready CI/CD pipelines for ECS microservices using GitHub Actions, covering automated builds, deployments, testing, and rollback strategies.

AWS DevOps cicd

October 22, 2025

Building Production-Grade ECS Microservices with CI/CD - Part 5: CI/CD Pipeline

Share This Post

Twitter LinkedIn Copy Link

Building Production-Grade ECS Microservices with CI/CD - Part 5: CI/CD Pipeline

Welcome to the final part of our comprehensive series on building production-grade microservices on AWS ECS. In this installment, we’ll implement a complete CI/CD pipeline using GitHub Actions that automates the entire deployment process, from code commit to production deployment.

What We’ll Build

In this final phase, we’ll create a production-ready CI/CD pipeline that automatically:

Builds Docker Images - Automated container image creation with versioning
Pushes to ECR - Secure image storage in Amazon Elastic Container Registry
Updates ECS Services - Automated task definition updates and service deployment
Validates Deployments - Comprehensive testing and health checks
Enables Rollbacks - Quick rollback capabilities for failed deployments
Supports Multi-Environment - Dev, staging, and production environments

CI/CD Pipeline Architecture

Our production-ready CI/CD pipeline follows modern DevOps practices:

┌─────────────────────────────────────────────────────────────────┐
│                    Developer Workflow                          │
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────┐        │
│  │   Code      │    │   Pull      │    │   Merge     │        │
│  │   Commit    │───▶│   Request   │───▶│   to Main   │        │
│  └─────────────┘    └─────────────┘    └─────────────┘        │
└────────────────────────┬────────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────────┐
│                  GitHub Actions Pipeline                       │
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────┐        │
│  │   Code      │    │   Build     │    │   Test      │        │
│  │   Checkout  │───▶│   Images    │───▶│   & Lint    │        │
│  └─────────────┘    └─────────────┘    └─────────────┘        │
│                                                               │
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────┐        │
│  │   Push to    │    │   Update   │    │   Deploy    │        │
│  │   ECR        │───▶│   Task     │───▶│   to ECS    │        │
│  │              │    │   Defs     │    │            │        │
│  └─────────────┘    └─────────────┘    └─────────────┘        │
└────────────────────────┬────────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────────┐
│                    AWS Infrastructure                          │
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────┐        │
│  │   ECR       │    │   ECS       │    │   ALB       │        │
│  │   Images    │───▶│   Services  │───▶│   Health    │        │
│  │   Stored    │    │   Updated   │    │   Checks    │        │
│  └─────────────┘    └─────────────┘    └─────────────┘        │
└─────────────────────────────────────────────────────────────────┘

Prerequisites

Before we begin, ensure you have completed the previous phases:

Infrastructure Requirements

Part 1 Complete: VPC, subnets, security groups, RDS database, ECR repositories
Part 2 Complete: ECS cluster, IAM roles, CloudWatch log groups
Part 3 Complete: Containerized applications tested locally
Part 4 Complete: ECS services deployed and running

GitHub Requirements

GitHub Repository: Code repository with proper structure
GitHub Actions: Enabled for the repository
Branch Protection: Optional but recommended for production

AWS Permissions

Ensure your AWS credentials have the following permissions:

ECR: ecr:GetAuthorizationToken, ecr:PutImage, ecr:BatchGetImage
ECS: ecs:DescribeServices, ecs:UpdateService, ecs:RegisterTaskDefinition
IAM: iam:PassRole (for ECS task execution)
ELB: elasticloadbalancing:DescribeLoadBalancers (for health checks)

Step-by-Step Implementation

Let’s build our production-ready CI/CD pipeline step by step.

Step 1: Set Up GitHub Repository

First, we’ll create a proper Git repository structure for our project.

# Initialize git in your project
cd ecs-cicd-project
git init

# Create .gitignore
cat > .gitignore << 'EOF'
# Terraform
*.tfstate
*.tfstate.*
*.tfstate.backup
.terraform/
.terraform.lock.hcl
terraform.tfvars

# IDE
.vscode/
.idea/
*.swp
*.swo

# OS
.DS_Store
Thumbs.db

# Python
__pycache__/
*.py[cod]
*.so
.Python
venv/
ENV/

# Docker
.dockerignore
EOF

# Add all files
git add .
git commit -m "Initial commit: ECS microservices project"

# Create repository on GitHub and push
git remote add origin https://github.com/YOUR_USERNAME/ecs-cicd-project.git
git branch -M main
git push -u origin main

Step 2: Configure GitHub Secrets

We’ll set up secure credentials for GitHub Actions to access AWS services.

2.1: Create IAM User for GitHub Actions

For production use, create a dedicated IAM user with minimal required permissions:

Create an IAM user with the following policy (or use your existing credentials):

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "ecr:GetAuthorizationToken",
        "ecr:BatchCheckLayerAvailability",
        "ecr:GetDownloadUrlForLayer",
        "ecr:BatchGetImage",
        "ecr:PutImage",
        "ecr:InitiateLayerUpload",
        "ecr:UploadLayerPart",
        "ecr:CompleteLayerUpload",
        "ecs:DescribeServices",
        "ecs:DescribeTaskDefinition",
        "ecs:DescribeTasks",
        "ecs:ListTasks",
        "ecs:RegisterTaskDefinition",
        "ecs:UpdateService",
        "iam:PassRole"
      ],
      "Resource": "*"
    }
  ]
}

2.2: Add Secrets to GitHub

Go to your repository on GitHub:

Click Settings
Click Secrets and variables → Actions
Click New repository secret

Add these secrets:

Secret Name	Value	Description
`AWS_ACCESS_KEY_ID`	Your AWS access key	IAM user access key
`AWS_SECRET_ACCESS_KEY`	Your AWS secret key	IAM user secret key
`AWS_REGION`	`ap-south-1`	AWS region
`AWS_ACCOUNT_ID`	Your AWS account ID	12-digit account ID
`ECR_FLASK_REPOSITORY`	`ecs-microservices/flask-app`	Flask ECR repo name
`ECR_NGINX_REPOSITORY`	`ecs-microservices/nginx`	Nginx ECR repo name
`ECS_CLUSTER_NAME`	`ecs-microservices-cluster`	ECS cluster name
`ECS_SERVICE_FLASK`	`ecs-microservices-flask-app`	Flask service name
`ECS_SERVICE_NGINX`	`ecs-microservices-nginx`	Nginx service name

Step 3: Implement GitHub Actions Workflows

We’ll create comprehensive GitHub Actions workflows for automated CI/CD.

3.1: Set Up Workflow Directory

mkdir -p .github/workflows

3.2: Create Production Deployment Workflow

Our main deployment workflow handles the complete CI/CD process:

Create .github/workflows/deploy.yml:

name: Deploy to ECS

on:
  push:
    branches:
      - main
      - develop
  workflow_dispatch:

env:
  AWS_REGION: ${{ secrets.AWS_REGION }}
  ECR_FLASK_REPOSITORY: ${{ secrets.ECR_FLASK_REPOSITORY }}
  ECR_NGINX_REPOSITORY: ${{ secrets.ECR_NGINX_REPOSITORY }}
  ECS_CLUSTER: ${{ secrets.ECS_CLUSTER_NAME }}
  ECS_SERVICE_FLASK: ${{ secrets.ECS_SERVICE_FLASK }}
  ECS_SERVICE_NGINX: ${{ secrets.ECS_SERVICE_NGINX }}

jobs:
  build-and-deploy:
    name: Build and Deploy
    runs-on: ubuntu-latest
    environment: production

    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
          aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          aws-region: ${{ env.AWS_REGION }}

      - name: Login to Amazon ECR
        id: login-ecr
        uses: aws-actions/amazon-ecr-login@v2

      - name: Set image tag
        id: image-tag
        run: |
          # Use git commit SHA as image tag
          echo "IMAGE_TAG=${GITHUB_SHA::7}" >> $GITHUB_OUTPUT
          echo "Image tag: ${GITHUB_SHA::7}"          

      - name: Build and push Flask image
        id: build-flask
        env:
          ECR_REGISTRY: ${{ steps.login-ecr.outputs.registry }}
          IMAGE_TAG: ${{ steps.image-tag.outputs.IMAGE_TAG }}
        run: |
          cd application/flask-app
          docker build -t $ECR_REGISTRY/$ECR_FLASK_REPOSITORY:$IMAGE_TAG .
          docker tag $ECR_REGISTRY/$ECR_FLASK_REPOSITORY:$IMAGE_TAG $ECR_REGISTRY/$ECR_FLASK_REPOSITORY:latest
          docker push $ECR_REGISTRY/$ECR_FLASK_REPOSITORY:$IMAGE_TAG
          docker push $ECR_REGISTRY/$ECR_FLASK_REPOSITORY:latest
          echo "image=$ECR_REGISTRY/$ECR_FLASK_REPOSITORY:$IMAGE_TAG" >> $GITHUB_OUTPUT          

      - name: Build and push Nginx image
        id: build-nginx
        env:
          ECR_REGISTRY: ${{ steps.login-ecr.outputs.registry }}
          IMAGE_TAG: ${{ steps.image-tag.outputs.IMAGE_TAG }}
        run: |
          cd application/nginx
          docker build -t $ECR_REGISTRY/$ECR_NGINX_REPOSITORY:$IMAGE_TAG .
          docker tag $ECR_REGISTRY/$ECR_NGINX_REPOSITORY:$IMAGE_TAG $ECR_REGISTRY/$ECR_NGINX_REPOSITORY:latest
          docker push $ECR_REGISTRY/$ECR_NGINX_REPOSITORY:$IMAGE_TAG
          docker push $ECR_REGISTRY/$ECR_NGINX_REPOSITORY:latest
          echo "image=$ECR_REGISTRY/$ECR_NGINX_REPOSITORY:$IMAGE_TAG" >> $GITHUB_OUTPUT          

      - name: Download Flask task definition
        run: |
          aws ecs describe-task-definition \
            --task-definition ecs-microservices-flask-app \
            --query taskDefinition > flask-task-definition.json          

      - name: Update Flask task definition
        id: flask-task-def
        uses: aws-actions/amazon-ecs-render-task-definition@v1
        with:
          task-definition: flask-task-definition.json
          container-name: flask-app
          image: ${{ steps.build-flask.outputs.image }}

      - name: Deploy Flask to ECS
        uses: aws-actions/amazon-ecs-deploy-task-definition@v1
        with:
          task-definition: ${{ steps.flask-task-def.outputs.task-definition }}
          service: ${{ env.ECS_SERVICE_FLASK }}
          cluster: ${{ env.ECS_CLUSTER }}
          wait-for-service-stability: true

      - name: Download Nginx task definition
        run: |
          aws ecs describe-task-definition \
            --task-definition ecs-microservices-nginx \
            --query taskDefinition > nginx-task-definition.json          

      - name: Update Nginx task definition
        id: nginx-task-def
        uses: aws-actions/amazon-ecs-render-task-definition@v1
        with:
          task-definition: nginx-task-definition.json
          container-name: nginx
          image: ${{ steps.build-nginx.outputs.image }}

      - name: Deploy Nginx to ECS
        uses: aws-actions/amazon-ecs-deploy-task-definition@v1
        with:
          task-definition: ${{ steps.nginx-task-def.outputs.task-definition }}
          service: ${{ env.ECS_SERVICE_NGINX }}
          cluster: ${{ env.ECS_CLUSTER }}
          wait-for-service-stability: true

      - name: Verify deployment
        run: |
          echo "✅ Deployment completed successfully!"
          echo "Flask image: ${{ steps.build-flask.outputs.image }}"
          echo "Nginx image: ${{ steps.build-nginx.outputs.image }}"

          # Get ALB DNS
          ALB_DNS=$(aws elbv2 describe-load-balancers \
            --names ecs-microservices-alb \
            --query 'LoadBalancers[0].DNSName' \
            --output text)

          echo "Application URL: http://$ALB_DNS"          

      - name: Deployment summary
        run: |
          echo "### Deployment Summary :rocket:" >> $GITHUB_STEP_SUMMARY
          echo "" >> $GITHUB_STEP_SUMMARY
          echo "- **Commit**: ${{ github.sha }}" >> $GITHUB_STEP_SUMMARY
          echo "- **Branch**: ${{ github.ref_name }}" >> $GITHUB_STEP_SUMMARY
          echo "- **Image Tag**: ${{ steps.image-tag.outputs.IMAGE_TAG }}" >> $GITHUB_STEP_SUMMARY
          echo "- **Flask Image**: ${{ steps.build-flask.outputs.image }}" >> $GITHUB_STEP_SUMMARY
          echo "- **Nginx Image**: ${{ steps.build-nginx.outputs.image }}" >> $GITHUB_STEP_SUMMARY

3.3: Create Infrastructure Validation Workflow

We’ll add Terraform validation to ensure infrastructure changes are valid:

Create .github/workflows/terraform.yml:

name: Terraform Validation

on:
  pull_request:
    paths:
      - "terraform/**"
  push:
    branches:
      - main
    paths:
      - "terraform/**"

jobs:
  terraform:
    name: Terraform Validation
    runs-on: ubuntu-latest

    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: 1.6.0

      - name: Terraform Format Check
        id: fmt
        run: terraform fmt -check -recursive
        working-directory: terraform
        continue-on-error: true

      - name: Terraform Init
        id: init
        run: terraform init -backend=false
        working-directory: terraform

      - name: Terraform Validate
        id: validate
        run: terraform validate
        working-directory: terraform

      - name: Comment PR
        if: github.event_name == 'pull_request'
        uses: actions/github-script@v7
        with:
          script: |
            const output = `#### Terraform Format and Style 🖌\`${{ steps.fmt.outcome }}\`
            #### Terraform Initialization ⚙️\`${{ steps.init.outcome }}\`
            #### Terraform Validation 🤖\`${{ steps.validate.outcome }}\`

            *Pushed by: @${{ github.actor }}, Action: \`${{ github.event_name }}\`*`;

            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: output
            })

3.4: Create Pull Request Quality Checks

We’ll implement comprehensive quality checks for pull requests:

Create .github/workflows/pr-checks.yml:

name: PR Checks

on:
  pull_request:
    branches:
      - main

jobs:
  lint-and-test:
    name: Lint and Test
    runs-on: ubuntu-latest

    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.11"

      - name: Install dependencies
        run: |
          cd application/flask-app
          pip install -r requirements.txt
          pip install flake8 pytest          

      - name: Lint with flake8
        run: |
          cd application/flask-app
          # Stop the build if there are Python syntax errors or undefined names
          flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
          # Exit-zero treats all errors as warnings
          flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics          
        continue-on-error: true

      - name: Test Docker builds
        run: |
          cd application
          docker-compose build
          echo "✅ Docker images built successfully"

Step 4: Create Production Deployment Scripts

We’ll create helper scripts for manual deployments and rollbacks.

4.1: Create Manual Deployment Script

Create scripts/deploy.sh:

#!/bin/bash

# Manual deployment script for ECS services
# Usage: ./deploy.sh [IMAGE_TAG]

set -e

# Configuration
AWS_REGION="ap-south-1"
ECS_CLUSTER="ecs-microservices-cluster"
FLASK_SERVICE="ecs-microservices-flask-app"
NGINX_SERVICE="ecs-microservices-nginx"
AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)

# Get image tag
IMAGE_TAG="${1:-latest}"

echo "🚀 Starting deployment with image tag: $IMAGE_TAG"

# ECR URLs
FLASK_IMAGE="$AWS_ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com/ecs-microservices/flask-app:$IMAGE_TAG"
NGINX_IMAGE="$AWS_ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com/ecs-microservices/nginx:$IMAGE_TAG"

echo "Flask image: $FLASK_IMAGE"
echo "Nginx image: $NGINX_IMAGE"

# Function to update service
update_service() {
    local service_name=$1
    local task_family=$2
    local container_name=$3
    local image=$4

    echo "📦 Updating $service_name..."

    # Get current task definition
    TASK_DEF=$(aws ecs describe-task-definition \
        --task-definition "$task_family" \
        --region "$AWS_REGION" \
        --query 'taskDefinition')

    # Create new task definition with updated image
    NEW_TASK_DEF=$(echo "$TASK_DEF" | jq --arg IMAGE "$image" --arg CONTAINER "$container_name" '
        .containerDefinitions |= map(
            if .name == $CONTAINER then
                .image = $IMAGE
            else
                .
            end
        ) |
        {
            family: .family,
            networkMode: .networkMode,
            taskRoleArn: .taskRoleArn,
            executionRoleArn: .executionRoleArn,
            containerDefinitions: .containerDefinitions,
            requiresCompatibilities: .requiresCompatibilities,
            cpu: .cpu,
            memory: .memory
        }
    ')

    # Register new task definition
    NEW_TASK_ARN=$(aws ecs register-task-definition \
        --region "$AWS_REGION" \
        --cli-input-json "$NEW_TASK_DEF" \
        --query 'taskDefinition.taskDefinitionArn' \
        --output text)

    echo "New task definition: $NEW_TASK_ARN"

    # Update service
    aws ecs update-service \
        --cluster "$ECS_CLUSTER" \
        --service "$service_name" \
        --task-definition "$NEW_TASK_ARN" \
        --region "$AWS_REGION" \
        --force-new-deployment \
        > /dev/null

    echo "✅ $service_name updated successfully"
}

# Update Flask service
update_service "$FLASK_SERVICE" "ecs-microservices-flask-app" "flask-app" "$FLASK_IMAGE"

# Update Nginx service
update_service "$NGINX_SERVICE" "ecs-microservices-nginx" "nginx" "$NGINX_IMAGE"

# Wait for services to stabilize
echo "⏳ Waiting for services to stabilize..."
aws ecs wait services-stable \
    --cluster "$ECS_CLUSTER" \
    --services "$FLASK_SERVICE" "$NGINX_SERVICE" \
    --region "$AWS_REGION"

echo "✅ Deployment completed successfully!"

# Get ALB DNS
ALB_DNS=$(aws elbv2 describe-load-balancers \
    --names ecs-microservices-alb \
    --region "$AWS_REGION" \
    --query 'LoadBalancers[0].DNSName' \
    --output text)

echo "🌐 Application URL: http://$ALB_DNS"

Make it executable:

chmod +x scripts/deploy.sh

4.2: Create Production Rollback Script

For quick recovery from failed deployments:

#!/bin/bash

# Rollback script for ECS services
# Usage: ./rollback.sh

set -e

AWS_REGION="ap-south-1"
ECS_CLUSTER="ecs-microservices-cluster"

echo "🔄 Rolling back ECS services..."

# Function to rollback service
rollback_service() {
    local service_name=$1

    echo "Rolling back $service_name..."

    # Get current task definition
    CURRENT_TASK=$(aws ecs describe-services \
        --cluster "$ECS_CLUSTER" \
        --services "$service_name" \
        --region "$AWS_REGION" \
        --query 'services[0].taskDefinition' \
        --output text)

    # Extract task family and revision
    TASK_FAMILY=$(echo "$CURRENT_TASK" | cut -d':' -f6 | cut -d'/' -f2)
    CURRENT_REVISION=$(echo "$CURRENT_TASK" | cut -d':' -f7)
    PREVIOUS_REVISION=$((CURRENT_REVISION - 1))

    if [ "$PREVIOUS_REVISION" -lt 1 ]; then
        echo "❌ No previous revision to rollback to"
        return 1
    fi

    PREVIOUS_TASK="$TASK_FAMILY:$PREVIOUS_REVISION"

    echo "Rolling back from $CURRENT_TASK to $PREVIOUS_TASK"

    # Update service with previous task definition
    aws ecs update-service \
        --cluster "$ECS_CLUSTER" \
        --service "$service_name" \
        --task-definition "$PREVIOUS_TASK" \
        --region "$AWS_REGION" \
        --force-new-deployment \
        > /dev/null

    echo "✅ $service_name rolled back"
}

# Rollback services
rollback_service "ecs-microservices-flask-app"
rollback_service "ecs-microservices-nginx"

# Wait for stability
echo "⏳ Waiting for services to stabilize..."
aws ecs wait services-stable \
    --cluster "$ECS_CLUSTER" \
    --services ecs-microservices-flask-app ecs-microservices-nginx \
    --region "$AWS_REGION"

echo "✅ Rollback completed successfully!"

Make it executable:

chmod +x scripts/rollback.sh

Step 5: Test and Validate the CI/CD Pipeline

Let’s thoroughly test our automated deployment pipeline.

5.1: Create a Test Change

Edit application/flask-app/app.py and change the version:

'version': '1.0.1',  # Changed from 1.0.0

5.2: Commit and Push

git add application/flask-app/app.py
git commit -m "Update Flask app version to 1.0.1"
git push origin main

5.3: Monitor Pipeline Execution

Go to your GitHub repository
Click Actions tab
Watch the deployment workflow execute
It should take ~10-15 minutes

5.4: Validate Production Deployment

Once complete, test your application:

# Get ALB DNS from GitHub Actions output or:
ALB_DNS=$(terraform output -raw alb_dns_name)

# Test the updated version
curl http://$ALB_DNS/ | jq '.version'
# Should show: "1.0.1"

Step 6: Configure Production Security

6.1: Set Up Branch Protection Rules

For production environments, configure branch protection:

Go to repository Settings → Branches
Add branch protection rule for main:
- ✅ Require a pull request before merging
- ✅ Require status checks to pass
- ✅ Require branches to be up to date

6.2: Configure Multi-Environment Support

For production use, set up multiple environments:

Environment Strategy:

Development: develop branch → dev environment
Staging: staging branch → staging environment
Production: main branch → production environment

Create .github/workflows/deploy-staging.yml:

name: Deploy to Staging

on:
  push:
    branches:
      - develop

env:
  AWS_REGION: ap-south-1
  ENVIRONMENT: staging
# Similar to deploy.yml but with staging-specific configuration

Production Usage Guide

Automated Deployment Workflow

Trigger: Every push to main branch automatically triggers deployment

git push origin main

Process:

Code is validated and tested
Docker images are built and pushed to ECR
ECS task definitions are updated
Services are deployed with zero downtime
Health checks verify deployment success

Manual Deployment Options

Option 1: GitHub Actions UI

Go to Actions → Deploy to ECS
Click Run workflow
Select branch and click Run workflow

Option 2: Local Script Deployment

# Build and push images locally
cd application
./build-and-push.sh v1.0.2

# Deploy using the script
cd ../scripts
./deploy.sh v1.0.2
./deploy.sh v1.0.2

Production Rollback Procedures

Quick Rollback (Recommended)

cd scripts
./rollback.sh

Manual Rollback via AWS Console

Go to ECS → Clusters → Services
Select service → Update
Select previous task definition revision
Force new deployment

Emergency Rollback

Use AWS CLI to immediately rollback to previous version
Monitor CloudWatch logs for issues
Verify application health after rollback

Production Troubleshooting

Symptoms: Workflow fails with “Unable to locate credentials” error

Solutions:

Check AWS Credentials: Verify AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY in GitHub secrets
Verify IAM Permissions: Ensure IAM user has ECR permissions
Check Region: Verify AWS_REGION secret is correct
Test Credentials: Use AWS CLI to test credentials locally

Issue: Task Definition Update Fails

Symptoms: ECS service update fails with permission errors

Solutions:

IAM Permissions: Ensure IAM user has ecs:RegisterTaskDefinition and iam:PassRole permissions
Task Role: Verify task execution role has necessary permissions
Resource Limits: Check CPU/memory limits in task definition
Image Availability: Ensure new image exists in ECR

Issue: Service Update Fails

Symptoms: ECS service remains in “pending” state

Solutions:

Check Service Events: Review ECS console for error messages
Image Verification: Ensure new image is accessible and valid
Resource Constraints: Verify sufficient resources in cluster
Health Checks: Ensure health check endpoints are responding
Security Groups: Verify security group rules allow traffic

Issue: Deployment Times Out

Symptoms: Workflow times out waiting for service stability

Solutions:

Health Check Configuration: Verify health check paths and intervals
Security Groups: Ensure ALB can reach ECS tasks
Network Connectivity: Check NAT Gateway and internet access
Resource Limits: Increase CPU/memory if needed
Timeout Settings: Adjust workflow timeout values

Issue: Rollback Fails

Symptoms: Rollback script fails to revert to previous version

Solutions:

Task Definition History: Check if previous revision exists
Service State: Ensure service is in stable state before rollback
Permissions: Verify IAM user has rollback permissions
Manual Rollback: Use AWS Console for immediate rollback

Production Best Practices

Code Management

Use Git Tags for Releases

git tag -a v1.0.0 -m "Release version 1.0.0"
git push origin v1.0.0

Always Test Locally First

docker-compose up
# Run tests locally before pushing

Use Pull Requests for Code Review
- Never push directly to main in production
- Require PR reviews from team members
- Run automated checks on all PRs
- Use branch protection rules

Security Best Practices

Keep Secrets Secure
- Never commit terraform.tfvars or credentials
- Use GitHub Secrets for all sensitive data
- Rotate AWS credentials regularly
- Use least privilege IAM policies

Monitor Deployments

# Monitor CloudWatch logs during deployment
aws logs tail /ecs/ecs-microservices/flask-app --follow

Operational Excellence

Implement Blue-Green Deployments
- Use ECS service updates for zero-downtime deployments
- Test new versions in staging before production
- Implement canary deployments for critical updates
Automated Testing
- Run unit tests in CI pipeline
- Implement integration tests
- Use security scanning for container images
- Validate infrastructure changes with Terraform
Monitoring and Alerting
- Set up CloudWatch alarms for service health
- Monitor deployment success rates
- Implement log aggregation and analysis
- Use AWS X-Ray for distributed tracing

Next Steps

✅ Part 5 Complete! Your production CI/CD pipeline now includes:

Fully Automated CI/CD with GitHub Actions
Automated Builds and Deployments with zero downtime
Comprehensive Testing and quality checks
Production Rollback capabilities
Multi-Environment Support for dev/staging/prod
Security Best Practices with secrets management
Monitoring and Alerting for operational excellence

Proceed to CLEANUP.md when you’re ready to tear down resources and avoid charges.

Key Takeaways

CI/CD Pipeline Benefits

Automated Deployment - GitHub Actions automates the entire deployment process
Version Control - ECR stores versioned container images with proper tagging
Zero Downtime - ECS task definitions enable seamless service updates
Quick Rollbacks - Easy rollback to previous versions when issues occur
Quality Assurance - Automated testing catches issues before production

Production Readiness

Security - Secure credential management with GitHub Secrets
Reliability - Comprehensive error handling and rollback procedures
Scalability - Multi-environment support for different deployment stages
Monitoring - CloudWatch integration for deployment visibility
Best Practices - Industry-standard DevOps practices implemented

Operational Excellence

Automation - Complete CI/CD pipeline reduces manual errors
Testing - Automated quality checks ensure code reliability
Deployment - Blue-green deployments minimize service disruption
Recovery - Quick rollback capabilities for failed deployments
Monitoring - Comprehensive logging and alerting for operations

Series Completion

🎉 Congratulations! You’ve successfully built a complete production-grade microservices architecture on AWS ECS with:

Part 1: Production-ready infrastructure with Terraform
Part 2: Containerized applications with Docker best practices
Part 3: ECS deployment with auto-scaling and load balancing
Part 4: Complete CI/CD pipeline with GitHub Actions
Part 5: Production monitoring, security, and operational excellence

This architecture provides a robust, scalable, and secure foundation for running microservices in production on AWS.

Ready for production? Your CI/CD pipeline is now ready to handle continuous deployments with confidence! Here is the Part 6, where we’ll clean up the source environment and temporary infrastructure!

Questions or feedback? Feel free to reach out in the comments below!

Share This Post

Twitter LinkedIn Copy Link

Building Production-Grade ECS Microservices with CI/CD - Part 5: CI/CD Pipeline

Table of Contents

Share This Post

Building Production-Grade ECS Microservices with CI/CD - Part 5: CI/CD Pipeline

What We’ll Build

CI/CD Pipeline Architecture

Prerequisites

Infrastructure Requirements

GitHub Requirements

AWS Permissions

Step-by-Step Implementation

Step 1: Set Up GitHub Repository

Step 2: Configure GitHub Secrets

2.1: Create IAM User for GitHub Actions

2.2: Add Secrets to GitHub

Step 3: Implement GitHub Actions Workflows

3.1: Set Up Workflow Directory

3.2: Create Production Deployment Workflow

3.3: Create Infrastructure Validation Workflow

3.4: Create Pull Request Quality Checks

Step 4: Create Production Deployment Scripts

4.1: Create Manual Deployment Script

4.2: Create Production Rollback Script

Step 5: Test and Validate the CI/CD Pipeline

5.1: Create a Test Change

5.2: Commit and Push

5.3: Monitor Pipeline Execution

5.4: Validate Production Deployment

Step 6: Configure Production Security

6.1: Set Up Branch Protection Rules

6.2: Configure Multi-Environment Support

Production Usage Guide

Automated Deployment Workflow

Manual Deployment Options

Option 1: GitHub Actions UI

Option 2: Local Script Deployment

Production Rollback Procedures

Quick Rollback (Recommended)

Manual Rollback via AWS Console

Emergency Rollback

Production Troubleshooting

Issue: GitHub Actions Workflow Fails at ECR Login

Issue: Task Definition Update Fails

Issue: Service Update Fails

Issue: Deployment Times Out

Issue: Rollback Fails

Production Best Practices

Code Management

Security Best Practices

Operational Excellence

Next Steps

Key Takeaways

CI/CD Pipeline Benefits

Production Readiness

Operational Excellence

Series Completion

Table of Contents

Share This Post