Complete guide to implementing production-ready CI/CD pipelines for ECS microservices using GitHub Actions, covering automated builds, deployments, testing, and rollback strategies.
Welcome to the final part of our comprehensive series on building production-grade microservices on AWS ECS. In this installment, we’ll implement a complete CI/CD pipeline using GitHub Actions that automates the entire deployment process, from code commit to production deployment.
In this final phase, we’ll create a production-ready CI/CD pipeline that automatically:
Our production-ready CI/CD pipeline follows modern DevOps practices:
┌─────────────────────────────────────────────────────────────────┐
│ Developer Workflow │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Code │ │ Pull │ │ Merge │ │
│ │ Commit │───▶│ Request │───▶│ to Main │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
└────────────────────────┬────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ GitHub Actions Pipeline │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Code │ │ Build │ │ Test │ │
│ │ Checkout │───▶│ Images │───▶│ & Lint │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Push to │ │ Update │ │ Deploy │ │
│ │ ECR │───▶│ Task │───▶│ to ECS │ │
│ │ │ │ Defs │ │ │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
└────────────────────────┬────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ AWS Infrastructure │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ ECR │ │ ECS │ │ ALB │ │
│ │ Images │───▶│ Services │───▶│ Health │ │
│ │ Stored │ │ Updated │ │ Checks │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
└─────────────────────────────────────────────────────────────────┘
Before we begin, ensure you have completed the previous phases:
Ensure your AWS credentials have the following permissions:
ecr:GetAuthorizationToken, ecr:PutImage, ecr:BatchGetImageecs:DescribeServices, ecs:UpdateService, ecs:RegisterTaskDefinitioniam:PassRole (for ECS task execution)elasticloadbalancing:DescribeLoadBalancers (for health checks)Let’s build our production-ready CI/CD pipeline step by step.
First, we’ll create a proper Git repository structure for our project.
# Initialize git in your project
cd ecs-cicd-project
git init
# Create .gitignore
cat > .gitignore << 'EOF'
# Terraform
*.tfstate
*.tfstate.*
*.tfstate.backup
.terraform/
.terraform.lock.hcl
terraform.tfvars
# IDE
.vscode/
.idea/
*.swp
*.swo
# OS
.DS_Store
Thumbs.db
# Python
__pycache__/
*.py[cod]
*.so
.Python
venv/
ENV/
# Docker
.dockerignore
EOF
# Add all files
git add .
git commit -m "Initial commit: ECS microservices project"
# Create repository on GitHub and push
git remote add origin https://github.com/YOUR_USERNAME/ecs-cicd-project.git
git branch -M main
git push -u origin main
We’ll set up secure credentials for GitHub Actions to access AWS services.
For production use, create a dedicated IAM user with minimal required permissions:
Create an IAM user with the following policy (or use your existing credentials):
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ecr:GetAuthorizationToken",
"ecr:BatchCheckLayerAvailability",
"ecr:GetDownloadUrlForLayer",
"ecr:BatchGetImage",
"ecr:PutImage",
"ecr:InitiateLayerUpload",
"ecr:UploadLayerPart",
"ecr:CompleteLayerUpload",
"ecs:DescribeServices",
"ecs:DescribeTaskDefinition",
"ecs:DescribeTasks",
"ecs:ListTasks",
"ecs:RegisterTaskDefinition",
"ecs:UpdateService",
"iam:PassRole"
],
"Resource": "*"
}
]
}
Go to your repository on GitHub:
Add these secrets:
| Secret Name | Value | Description |
|---|---|---|
AWS_ACCESS_KEY_ID | Your AWS access key | IAM user access key |
AWS_SECRET_ACCESS_KEY | Your AWS secret key | IAM user secret key |
AWS_REGION | ap-south-1 | AWS region |
AWS_ACCOUNT_ID | Your AWS account ID | 12-digit account ID |
ECR_FLASK_REPOSITORY | ecs-microservices/flask-app | Flask ECR repo name |
ECR_NGINX_REPOSITORY | ecs-microservices/nginx | Nginx ECR repo name |
ECS_CLUSTER_NAME | ecs-microservices-cluster | ECS cluster name |
ECS_SERVICE_FLASK | ecs-microservices-flask-app | Flask service name |
ECS_SERVICE_NGINX | ecs-microservices-nginx | Nginx service name |
We’ll create comprehensive GitHub Actions workflows for automated CI/CD.
mkdir -p .github/workflows
Our main deployment workflow handles the complete CI/CD process:
Create .github/workflows/deploy.yml:
name: Deploy to ECS
on:
push:
branches:
- main
- develop
workflow_dispatch:
env:
AWS_REGION: ${{ secrets.AWS_REGION }}
ECR_FLASK_REPOSITORY: ${{ secrets.ECR_FLASK_REPOSITORY }}
ECR_NGINX_REPOSITORY: ${{ secrets.ECR_NGINX_REPOSITORY }}
ECS_CLUSTER: ${{ secrets.ECS_CLUSTER_NAME }}
ECS_SERVICE_FLASK: ${{ secrets.ECS_SERVICE_FLASK }}
ECS_SERVICE_NGINX: ${{ secrets.ECS_SERVICE_NGINX }}
jobs:
build-and-deploy:
name: Build and Deploy
runs-on: ubuntu-latest
environment: production
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: ${{ env.AWS_REGION }}
- name: Login to Amazon ECR
id: login-ecr
uses: aws-actions/amazon-ecr-login@v2
- name: Set image tag
id: image-tag
run: |
# Use git commit SHA as image tag
echo "IMAGE_TAG=${GITHUB_SHA::7}" >> $GITHUB_OUTPUT
echo "Image tag: ${GITHUB_SHA::7}"
- name: Build and push Flask image
id: build-flask
env:
ECR_REGISTRY: ${{ steps.login-ecr.outputs.registry }}
IMAGE_TAG: ${{ steps.image-tag.outputs.IMAGE_TAG }}
run: |
cd application/flask-app
docker build -t $ECR_REGISTRY/$ECR_FLASK_REPOSITORY:$IMAGE_TAG .
docker tag $ECR_REGISTRY/$ECR_FLASK_REPOSITORY:$IMAGE_TAG $ECR_REGISTRY/$ECR_FLASK_REPOSITORY:latest
docker push $ECR_REGISTRY/$ECR_FLASK_REPOSITORY:$IMAGE_TAG
docker push $ECR_REGISTRY/$ECR_FLASK_REPOSITORY:latest
echo "image=$ECR_REGISTRY/$ECR_FLASK_REPOSITORY:$IMAGE_TAG" >> $GITHUB_OUTPUT
- name: Build and push Nginx image
id: build-nginx
env:
ECR_REGISTRY: ${{ steps.login-ecr.outputs.registry }}
IMAGE_TAG: ${{ steps.image-tag.outputs.IMAGE_TAG }}
run: |
cd application/nginx
docker build -t $ECR_REGISTRY/$ECR_NGINX_REPOSITORY:$IMAGE_TAG .
docker tag $ECR_REGISTRY/$ECR_NGINX_REPOSITORY:$IMAGE_TAG $ECR_REGISTRY/$ECR_NGINX_REPOSITORY:latest
docker push $ECR_REGISTRY/$ECR_NGINX_REPOSITORY:$IMAGE_TAG
docker push $ECR_REGISTRY/$ECR_NGINX_REPOSITORY:latest
echo "image=$ECR_REGISTRY/$ECR_NGINX_REPOSITORY:$IMAGE_TAG" >> $GITHUB_OUTPUT
- name: Download Flask task definition
run: |
aws ecs describe-task-definition \
--task-definition ecs-microservices-flask-app \
--query taskDefinition > flask-task-definition.json
- name: Update Flask task definition
id: flask-task-def
uses: aws-actions/amazon-ecs-render-task-definition@v1
with:
task-definition: flask-task-definition.json
container-name: flask-app
image: ${{ steps.build-flask.outputs.image }}
- name: Deploy Flask to ECS
uses: aws-actions/amazon-ecs-deploy-task-definition@v1
with:
task-definition: ${{ steps.flask-task-def.outputs.task-definition }}
service: ${{ env.ECS_SERVICE_FLASK }}
cluster: ${{ env.ECS_CLUSTER }}
wait-for-service-stability: true
- name: Download Nginx task definition
run: |
aws ecs describe-task-definition \
--task-definition ecs-microservices-nginx \
--query taskDefinition > nginx-task-definition.json
- name: Update Nginx task definition
id: nginx-task-def
uses: aws-actions/amazon-ecs-render-task-definition@v1
with:
task-definition: nginx-task-definition.json
container-name: nginx
image: ${{ steps.build-nginx.outputs.image }}
- name: Deploy Nginx to ECS
uses: aws-actions/amazon-ecs-deploy-task-definition@v1
with:
task-definition: ${{ steps.nginx-task-def.outputs.task-definition }}
service: ${{ env.ECS_SERVICE_NGINX }}
cluster: ${{ env.ECS_CLUSTER }}
wait-for-service-stability: true
- name: Verify deployment
run: |
echo "✅ Deployment completed successfully!"
echo "Flask image: ${{ steps.build-flask.outputs.image }}"
echo "Nginx image: ${{ steps.build-nginx.outputs.image }}"
# Get ALB DNS
ALB_DNS=$(aws elbv2 describe-load-balancers \
--names ecs-microservices-alb \
--query 'LoadBalancers[0].DNSName' \
--output text)
echo "Application URL: http://$ALB_DNS"
- name: Deployment summary
run: |
echo "### Deployment Summary :rocket:" >> $GITHUB_STEP_SUMMARY
echo "" >> $GITHUB_STEP_SUMMARY
echo "- **Commit**: ${{ github.sha }}" >> $GITHUB_STEP_SUMMARY
echo "- **Branch**: ${{ github.ref_name }}" >> $GITHUB_STEP_SUMMARY
echo "- **Image Tag**: ${{ steps.image-tag.outputs.IMAGE_TAG }}" >> $GITHUB_STEP_SUMMARY
echo "- **Flask Image**: ${{ steps.build-flask.outputs.image }}" >> $GITHUB_STEP_SUMMARY
echo "- **Nginx Image**: ${{ steps.build-nginx.outputs.image }}" >> $GITHUB_STEP_SUMMARY
We’ll add Terraform validation to ensure infrastructure changes are valid:
Create .github/workflows/terraform.yml:
name: Terraform Validation
on:
pull_request:
paths:
- "terraform/**"
push:
branches:
- main
paths:
- "terraform/**"
jobs:
terraform:
name: Terraform Validation
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Setup Terraform
uses: hashicorp/setup-terraform@v3
with:
terraform_version: 1.6.0
- name: Terraform Format Check
id: fmt
run: terraform fmt -check -recursive
working-directory: terraform
continue-on-error: true
- name: Terraform Init
id: init
run: terraform init -backend=false
working-directory: terraform
- name: Terraform Validate
id: validate
run: terraform validate
working-directory: terraform
- name: Comment PR
if: github.event_name == 'pull_request'
uses: actions/github-script@v7
with:
script: |
const output = `#### Terraform Format and Style 🖌\`${{ steps.fmt.outcome }}\`
#### Terraform Initialization ⚙️\`${{ steps.init.outcome }}\`
#### Terraform Validation 🤖\`${{ steps.validate.outcome }}\`
*Pushed by: @${{ github.actor }}, Action: \`${{ github.event_name }}\`*`;
github.rest.issues.createComment({
issue_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo,
body: output
})
We’ll implement comprehensive quality checks for pull requests:
Create .github/workflows/pr-checks.yml:
name: PR Checks
on:
pull_request:
branches:
- main
jobs:
lint-and-test:
name: Lint and Test
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.11"
- name: Install dependencies
run: |
cd application/flask-app
pip install -r requirements.txt
pip install flake8 pytest
- name: Lint with flake8
run: |
cd application/flask-app
# Stop the build if there are Python syntax errors or undefined names
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
# Exit-zero treats all errors as warnings
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
continue-on-error: true
- name: Test Docker builds
run: |
cd application
docker-compose build
echo "✅ Docker images built successfully"
We’ll create helper scripts for manual deployments and rollbacks.
Create scripts/deploy.sh:
#!/bin/bash
# Manual deployment script for ECS services
# Usage: ./deploy.sh [IMAGE_TAG]
set -e
# Configuration
AWS_REGION="ap-south-1"
ECS_CLUSTER="ecs-microservices-cluster"
FLASK_SERVICE="ecs-microservices-flask-app"
NGINX_SERVICE="ecs-microservices-nginx"
AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
# Get image tag
IMAGE_TAG="${1:-latest}"
echo "🚀 Starting deployment with image tag: $IMAGE_TAG"
# ECR URLs
FLASK_IMAGE="$AWS_ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com/ecs-microservices/flask-app:$IMAGE_TAG"
NGINX_IMAGE="$AWS_ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com/ecs-microservices/nginx:$IMAGE_TAG"
echo "Flask image: $FLASK_IMAGE"
echo "Nginx image: $NGINX_IMAGE"
# Function to update service
update_service() {
local service_name=$1
local task_family=$2
local container_name=$3
local image=$4
echo "📦 Updating $service_name..."
# Get current task definition
TASK_DEF=$(aws ecs describe-task-definition \
--task-definition "$task_family" \
--region "$AWS_REGION" \
--query 'taskDefinition')
# Create new task definition with updated image
NEW_TASK_DEF=$(echo "$TASK_DEF" | jq --arg IMAGE "$image" --arg CONTAINER "$container_name" '
.containerDefinitions |= map(
if .name == $CONTAINER then
.image = $IMAGE
else
.
end
) |
{
family: .family,
networkMode: .networkMode,
taskRoleArn: .taskRoleArn,
executionRoleArn: .executionRoleArn,
containerDefinitions: .containerDefinitions,
requiresCompatibilities: .requiresCompatibilities,
cpu: .cpu,
memory: .memory
}
')
# Register new task definition
NEW_TASK_ARN=$(aws ecs register-task-definition \
--region "$AWS_REGION" \
--cli-input-json "$NEW_TASK_DEF" \
--query 'taskDefinition.taskDefinitionArn' \
--output text)
echo "New task definition: $NEW_TASK_ARN"
# Update service
aws ecs update-service \
--cluster "$ECS_CLUSTER" \
--service "$service_name" \
--task-definition "$NEW_TASK_ARN" \
--region "$AWS_REGION" \
--force-new-deployment \
> /dev/null
echo "✅ $service_name updated successfully"
}
# Update Flask service
update_service "$FLASK_SERVICE" "ecs-microservices-flask-app" "flask-app" "$FLASK_IMAGE"
# Update Nginx service
update_service "$NGINX_SERVICE" "ecs-microservices-nginx" "nginx" "$NGINX_IMAGE"
# Wait for services to stabilize
echo "⏳ Waiting for services to stabilize..."
aws ecs wait services-stable \
--cluster "$ECS_CLUSTER" \
--services "$FLASK_SERVICE" "$NGINX_SERVICE" \
--region "$AWS_REGION"
echo "✅ Deployment completed successfully!"
# Get ALB DNS
ALB_DNS=$(aws elbv2 describe-load-balancers \
--names ecs-microservices-alb \
--region "$AWS_REGION" \
--query 'LoadBalancers[0].DNSName' \
--output text)
echo "🌐 Application URL: http://$ALB_DNS"
Make it executable:
chmod +x scripts/deploy.sh
For quick recovery from failed deployments:
#!/bin/bash
# Rollback script for ECS services
# Usage: ./rollback.sh
set -e
AWS_REGION="ap-south-1"
ECS_CLUSTER="ecs-microservices-cluster"
echo "🔄 Rolling back ECS services..."
# Function to rollback service
rollback_service() {
local service_name=$1
echo "Rolling back $service_name..."
# Get current task definition
CURRENT_TASK=$(aws ecs describe-services \
--cluster "$ECS_CLUSTER" \
--services "$service_name" \
--region "$AWS_REGION" \
--query 'services[0].taskDefinition' \
--output text)
# Extract task family and revision
TASK_FAMILY=$(echo "$CURRENT_TASK" | cut -d':' -f6 | cut -d'/' -f2)
CURRENT_REVISION=$(echo "$CURRENT_TASK" | cut -d':' -f7)
PREVIOUS_REVISION=$((CURRENT_REVISION - 1))
if [ "$PREVIOUS_REVISION" -lt 1 ]; then
echo "❌ No previous revision to rollback to"
return 1
fi
PREVIOUS_TASK="$TASK_FAMILY:$PREVIOUS_REVISION"
echo "Rolling back from $CURRENT_TASK to $PREVIOUS_TASK"
# Update service with previous task definition
aws ecs update-service \
--cluster "$ECS_CLUSTER" \
--service "$service_name" \
--task-definition "$PREVIOUS_TASK" \
--region "$AWS_REGION" \
--force-new-deployment \
> /dev/null
echo "✅ $service_name rolled back"
}
# Rollback services
rollback_service "ecs-microservices-flask-app"
rollback_service "ecs-microservices-nginx"
# Wait for stability
echo "⏳ Waiting for services to stabilize..."
aws ecs wait services-stable \
--cluster "$ECS_CLUSTER" \
--services ecs-microservices-flask-app ecs-microservices-nginx \
--region "$AWS_REGION"
echo "✅ Rollback completed successfully!"
Make it executable:
chmod +x scripts/rollback.sh
Let’s thoroughly test our automated deployment pipeline.
Edit application/flask-app/app.py and change the version:
'version': '1.0.1', # Changed from 1.0.0
git add application/flask-app/app.py
git commit -m "Update Flask app version to 1.0.1"
git push origin main
Once complete, test your application:
# Get ALB DNS from GitHub Actions output or:
ALB_DNS=$(terraform output -raw alb_dns_name)
# Test the updated version
curl http://$ALB_DNS/ | jq '.version'
# Should show: "1.0.1"
For production environments, configure branch protection:
main:For production use, set up multiple environments:
Environment Strategy:
develop branch → dev environmentstaging branch → staging environmentmain branch → production environmentCreate .github/workflows/deploy-staging.yml:
name: Deploy to Staging
on:
push:
branches:
- develop
env:
AWS_REGION: ap-south-1
ENVIRONMENT: staging
# Similar to deploy.yml but with staging-specific configuration
Trigger: Every push to main branch automatically triggers deployment
git push origin main
Process:
# Build and push images locally
cd application
./build-and-push.sh v1.0.2
# Deploy using the script
cd ../scripts
./deploy.sh v1.0.2
./deploy.sh v1.0.2
cd scripts
./rollback.sh
Symptoms: Workflow fails with “Unable to locate credentials” error
Solutions:
AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY in GitHub secretsAWS_REGION secret is correctSymptoms: ECS service update fails with permission errors
Solutions:
ecs:RegisterTaskDefinition and iam:PassRole permissionsSymptoms: ECS service remains in “pending” state
Solutions:
Symptoms: Workflow times out waiting for service stability
Solutions:
Symptoms: Rollback script fails to revert to previous version
Solutions:
Use Git Tags for Releases
git tag -a v1.0.0 -m "Release version 1.0.0"
git push origin v1.0.0
Always Test Locally First
docker-compose up
# Run tests locally before pushing
Use Pull Requests for Code Review
main in productionKeep Secrets Secure
terraform.tfvars or credentialsMonitor Deployments
# Monitor CloudWatch logs during deployment
aws logs tail /ecs/ecs-microservices/flask-app --follow
Implement Blue-Green Deployments
Automated Testing
Monitoring and Alerting
✅ Part 5 Complete! Your production CI/CD pipeline now includes:
Proceed to CLEANUP.md when you’re ready to tear down resources and avoid charges.
🎉 Congratulations! You’ve successfully built a complete production-grade microservices architecture on AWS ECS with:
This architecture provides a robust, scalable, and secure foundation for running microservices in production on AWS.
Ready for production? Your CI/CD pipeline is now ready to handle continuous deployments with confidence! Here is the Part 6, where we’ll clean up the source environment and temporary infrastructure!
Questions or feedback? Feel free to reach out in the comments below!