AWS Secure Document Pipeline - Part 1: Building Production-Ready S3 Infrastructure with Terraform

Learn how to build a secure, production-grade S3-based document processing pipeline using Infrastructure as Code. Complete Terraform setup with 5 S3 buckets, automatic replication, and strict IAM policies.

AWS Secure Document Pipeline - Part 1: Building Production-Ready S3 Infrastructure with Terraform

Table of Contents

AWS Secure Document Pipeline - Part 1: Building Production-Ready S3 Infrastructure with Terraform

Introduction

Building a secure, scalable document processing pipeline is crucial for modern applications that handle sensitive documents. This comprehensive guide walks you through creating a production-grade S3-based document processing pipeline using Infrastructure as Code (Terraform). This foundation includes 5 S3 buckets with automatic replication, strict IAM policies, and comprehensive security controls.

What You’ll Learn

  • How to design a secure multi-bucket S3 architecture
  • Setting up automatic cross-bucket replication with Terraform
  • Configuring strict IAM policies with IP-based restrictions
  • Implementing comprehensive logging for compliance
  • Building production-ready infrastructure with proper security controls
  • Complete cleanup procedures to avoid ongoing charges

Prerequisites

  • AWS Account with administrative access
  • AWS CLI (v2.x or higher) installed and configured
  • Terraform (v1.5.0 or higher) installed
  • Your public IP address (find it at: https://whatismyipaddress.com/)
  • Basic understanding of S3, IAM, and Terraform concepts

Architecture Overview

Our secure document processing pipeline provides multiple layers of protection and automation:

┌─────────────────────────────────────────────────────────────────────┐
│                           Third Party Client                        │
│                      (IP-Restricted IAM User)                       │
└────────────┬──────────────────────────────────────┬─────────────────┘
             │ Upload PDF                           │ Download Result
             ↓                                      ↑
    ┌─────────────────┐                   ┌─────────────────┐
    │  uploads bucket │                   │ delivery bucket │
    │  (Versioned +   │                   │  (Versioned +   │
    │   Encrypted)    │                   │   Encrypted)    │
    └────────┬────────┘                   └────────┬────────┘
             │                                      ↑
             │ S3 Replication                      │ S3 Replication
             ↓                                      │
    ┌────────────────────────┐           ┌─────────────────────────┐
    │ internal-processing    │           │ processed-output bucket │
    │       bucket           │           │    (Versioned +         │
    │  (Versioned +          │←──────────│     Encrypted)          │
    │   Encrypted)           │  Lambda   └─────────────────────────┘
    └────────────────────────┘  Process
                                 (Phase 2)

    ┌─────────────────────────────────────────────────────────────────┐
    │              compliance-logs bucket                             │
    │         (Receives server access logs from all buckets)          │
    └─────────────────────────────────────────────────────────────────┘

Key Benefits

  • Security First: IP-restricted access, encryption at rest, and comprehensive logging
  • Automated Processing: S3 events trigger Lambda functions for document processing
  • Disaster Recovery: Cross-bucket replication ensures data availability
  • Compliance Ready: Server access logging and audit trails
  • Cost Optimized: Lifecycle policies and intelligent storage management
  • Scalable Architecture: Infrastructure as Code for easy replication and management

Data Flow

  1. Third party uploads PDF to uploads bucket (restricted by IP address)
  2. S3 automatically replicates file to internal-processing bucket
  3. Lambda processes the file and saves output to processed-output bucket (Phase 2)
  4. S3 automatically replicates processed file to delivery bucket
  5. Third party downloads processed file from delivery bucket
  6. All bucket access logs flow to compliance-logs bucket

Step-by-Step Setup

Phase 1: Prerequisites and Environment Setup

Required Tools Installation

AWS CLI Installation:

Windows:

# Using MSI installer
# Download from: https://awscli.amazonaws.com/AWSCLIV2.msi
msiexec.exe /i https://awscli.amazonaws.com/AWSCLIV2.msi

Linux:

curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
sudo ./aws/install

macOS:

curl "https://awscli.amazonaws.com/AWSCLIV2.pkg" -o "AWSCLIV2.pkg"
sudo installer -pkg AWSCLIV2.pkg -target /

Verify Installation:

aws --version
# Expected output: aws-cli/2.x.x Python/3.x.x...

Configure AWS CLI:

aws configure
# AWS Access Key ID: [Enter your access key]
# AWS Secret Access Key: [Enter your secret key]
# Default region name: ap-south-1
# Default output format: json

Terraform Installation:

Windows (using Chocolatey):

choco install terraform

Linux:

wget -O- https://apt.releases.hashicorp.com/gpg | sudo gpg --dearmor -o /usr/share/keyrings/hashicorp-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] https://apt.releases.hashicorp.com $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/hashicorp.list
sudo apt update && sudo apt install terraform

macOS:

brew tap hashicorp/tap
brew install hashicorp/tap/terraform

Verify Installation:

terraform version
# Expected output: Terraform v1.5.x or higher

Phase 2: Create Project Directory Structure

Create a dedicated directory for this project:

# Navigate to your projects folder
cd /path/to/your/projects

# Create project directory
mkdir secure-doc-pipeline
cd secure-doc-pipeline

# Create Terraform directory
mkdir terraform
cd terraform

Your directory structure will look like:

secure-doc-pipeline/
└── terraform/
    ├── main.tf
    ├── variables.tf
    ├── outputs.tf
    └── terraform.tfvars

Phase 3: Create Terraform Configuration Files

Step 3.1: Create variables.tf

This file defines all configurable parameters for the infrastructure.

Create file: terraform/variables.tf

variable "aws_region" {
  description = "The AWS region to deploy resources in"
  type        = string
  default     = "ap-south-1"
}

variable "project_name" {
  description = "A unique name for the project to prefix all resources"
  type        = string
  default     = "secure-doc-pipeline"
}

variable "third_party_ip" {
  description = "The trusted IP address of the third party (CIDR notation)"
  type        = string
  # IMPORTANT: Replace this with your actual IP address
  # Find your IP at: https://whatismyipaddress.com/
  # Add /32 at the end for a single IP
  default     = "YOUR_IP_ADDRESS/32"
}

variable "enable_logging" {
  description = "Enable S3 server access logging for compliance"
  type        = bool
  default     = true
}

variable "force_destroy_buckets" {
  description = "Allow buckets to be destroyed even if they contain objects (useful for dev/test)"
  type        = bool
  default     = true  # Set to false for production
}

variable "versioning_enabled" {
  description = "Enable versioning on all buckets"
  type        = bool
  default     = true
}

Step 3.2: Create main.tf

This is the core infrastructure definition file.

Create file: terraform/main.tf

# ============================================
# Provider Configuration
# ============================================
terraform {
  required_version = ">= 1.5.0"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

provider "aws" {
  region = var.aws_region

  default_tags {
    tags = {
      Project     = var.project_name
      ManagedBy   = "Terraform"
      Environment = "Development"
      Purpose     = "Secure Document Processing Pipeline"
    }
  }
}

# ============================================
# Local Variables
# ============================================
locals {
  # Define all bucket names
  bucket_names = {
    uploads              = "${var.project_name}-uploads"
    internal_processing  = "${var.project_name}-internal-processing"
    processed_output     = "${var.project_name}-processed-output"
    delivery            = "${var.project_name}-delivery"
    compliance_logs     = "${var.project_name}-compliance-logs"
  }

  # Buckets that will have replication enabled
  replication_source_buckets = [
    local.bucket_names.uploads,
    local.bucket_names.processed_output
  ]

  # Buckets that will send logs to compliance bucket
  logged_buckets = [
    local.bucket_names.uploads,
    local.bucket_names.internal_processing,
    local.bucket_names.processed_output,
    local.bucket_names.delivery
  ]
}

# ============================================
# S3 Buckets Creation
# ============================================
resource "aws_s3_bucket" "doc_buckets" {
  for_each = local.bucket_names

  bucket = each.value

  # Allow Terraform to destroy bucket even with objects (dev/test only)
  force_destroy = var.force_destroy_buckets

  tags = {
    Name = each.value
    Type = each.key
  }
}

# ============================================
# S3 Bucket Versioning
# ============================================
resource "aws_s3_bucket_versioning" "versioning" {
  for_each = aws_s3_bucket.doc_buckets

  bucket = each.value.id

  versioning_configuration {
    status = var.versioning_enabled ? "Enabled" : "Suspended"
  }
}

# ============================================
# S3 Bucket Encryption (AES256)
# ============================================
resource "aws_s3_bucket_server_side_encryption_configuration" "encryption" {
  for_each = aws_s3_bucket.doc_buckets

  bucket = each.value.id

  rule {
    apply_server_side_encryption_by_default {
      sse_algorithm = "AES256"
    }
    bucket_key_enabled = true
  }
}

# ============================================
# S3 Public Access Block (Security Best Practice)
# ============================================
resource "aws_s3_bucket_public_access_block" "pab" {
  for_each = aws_s3_bucket.doc_buckets

  bucket = each.value.id

  block_public_acls       = true
  block_public_policy     = true
  ignore_public_acls      = true
  restrict_public_buckets = true
}

# ============================================
# S3 Bucket Lifecycle Rules (Cost Optimization)
# ============================================
resource "aws_s3_bucket_lifecycle_configuration" "lifecycle" {
  for_each = {
    for k,v in aws_s3_bucket.doc_buckets : k => v
    if k != "compliance_logs"
  }
  bucket = each.value.id

  rule {
    id = "delete-old-versions"
    status = "Enabled"

    filter {}  # Apply to all objects

    noncurrent_version_expiration {
      noncurrent_days = 90
    }
  }

  rule {
    id = "cleanup-incomplete-uploads"
    status = "Enabled"

    filter {}  # Apply to all objects

    abort_incomplete_multipart_upload {
      days_after_initiation = 7
    }
  }
}

# ============================================
# S3 Server Access Logging
# ============================================
resource "aws_s3_bucket_logging" "access_logging" {
  for_each = {
    for bucket in local.logged_buckets : bucket => bucket
    if var.enable_logging
  }

  bucket = each.value

  target_bucket = aws_s3_bucket.doc_buckets[local.bucket_names.compliance_logs].id
  target_prefix = "${each.value}/"
}

# ============================================
# IAM Role for S3 Replication
# ============================================
resource "aws_iam_role" "replication_role" {
  name = "${var.project_name}-replication-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = "sts:AssumeRole"
        Effect = "Allow"
        Principal = {
          Service = "s3.amazonaws.com"
        }
      }
    ]
  })

  tags = {
    Name = "${var.project_name}-replication-role"
  }
}

# ============================================
# IAM Policy for S3 Replication
# ============================================
resource "aws_iam_policy" "replication_policy" {
  name        = "${var.project_name}-replication-policy"
  description = "Policy for S3 bucket replication"

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Sid    = "AllowS3GetReplicationConfiguration"
        Effect = "Allow"
        Action = [
          "s3:GetReplicationConfiguration",
          "s3:ListBucket"
        ]
        Resource = [
          aws_s3_bucket.doc_buckets[local.bucket_names.uploads].arn,
          aws_s3_bucket.doc_buckets[local.bucket_names.processed_output].arn
        ]
      },
      {
        Sid    = "AllowS3GetObjectVersions"
        Effect = "Allow"
        Action = [
          "s3:GetObjectVersionForReplication",
          "s3:GetObjectVersionAcl",
          "s3:GetObjectVersionTagging"
        ]
        Resource = [
          "${aws_s3_bucket.doc_buckets[local.bucket_names.uploads].arn}/*",
          "${aws_s3_bucket.doc_buckets[local.bucket_names.processed_output].arn}/*"
        ]
      },
      {
        Sid    = "AllowS3ReplicateObjects"
        Effect = "Allow"
        Action = [
          "s3:ReplicateObject",
          "s3:ReplicateDelete",
          "s3:ReplicateTags"
        ]
        Resource = [
          "${aws_s3_bucket.doc_buckets[local.bucket_names.internal_processing].arn}/*",
          "${aws_s3_bucket.doc_buckets[local.bucket_names.delivery].arn}/*"
        ]
      }
    ]
  })
}

# Attach policy to role
resource "aws_iam_role_policy_attachment" "replication_attach" {
  role       = aws_iam_role.replication_role.name
  policy_arn = aws_iam_policy.replication_policy.arn
}

# ============================================
# S3 Replication: uploads → internal-processing
# ============================================
resource "aws_s3_bucket_replication_configuration" "uploads_to_internal" {
  depends_on = [
    aws_s3_bucket_versioning.versioning,
    aws_iam_role_policy_attachment.replication_attach
  ]

  role   = aws_iam_role.replication_role.arn
  bucket = aws_s3_bucket.doc_buckets[local.bucket_names.uploads].id

  rule {
    id     = "ReplicateAllUploads"
    status = "Enabled"

    filter {}  # Replicate all objects

    destination {
      bucket        = aws_s3_bucket.doc_buckets[local.bucket_names.internal_processing].arn
      storage_class = "STANDARD"
    }

    delete_marker_replication {
      status = "Enabled"
    }
  }
}

# ============================================
# S3 Replication: processed-output → delivery
# ============================================
resource "aws_s3_bucket_replication_configuration" "processed_to_delivery" {
  depends_on = [
    aws_s3_bucket_versioning.versioning,
    aws_iam_role_policy_attachment.replication_attach
  ]

  role   = aws_iam_role.replication_role.arn
  bucket = aws_s3_bucket.doc_buckets[local.bucket_names.processed_output].id

  rule {
    id     = "ReplicateProcessedFiles"
    status = "Enabled"

    filter {}  # Replicate all objects

    destination {
      bucket        = aws_s3_bucket.doc_buckets[local.bucket_names.delivery].arn
      storage_class = "STANDARD"
    }

    delete_marker_replication {
      status = "Enabled"
    }
  }
}

# ============================================
# IAM User for Third Party
# ============================================
resource "aws_iam_user" "third_party_user" {
  name = "${var.project_name}-third-party-user"

  tags = {
    Name        = "${var.project_name}-third-party-user"
    Description = "Restricted IAM user for third-party document uploads and downloads"
  }
}

# ============================================
# IAM Policy for Third Party User
# ============================================
data "aws_iam_policy_document" "third_party_policy_doc" {
  # Allow uploads to uploads bucket
  statement {
    sid    = "AllowUploadsToUploadsBucket"
    effect = "Allow"
    actions = [
      "s3:PutObject",
      "s3:PutObjectAcl"
    ]
    resources = [
      "${aws_s3_bucket.doc_buckets[local.bucket_names.uploads].arn}/*"
    ]
  }

  # Allow downloads from delivery bucket
  statement {
    sid    = "AllowDownloadsFromDeliveryBucket"
    effect = "Allow"
    actions = [
      "s3:GetObject",
      "s3:GetObjectVersion"
    ]
    resources = [
      "${aws_s3_bucket.doc_buckets[local.bucket_names.delivery].arn}/*"
    ]
  }

  # Allow listing objects in delivery bucket (to see what's available)
  statement {
    sid    = "AllowListDeliveryBucket"
    effect = "Allow"
    actions = [
      "s3:ListBucket",
      "s3:ListBucketVersions"
    ]
    resources = [
      aws_s3_bucket.doc_buckets[local.bucket_names.delivery].arn
    ]
  }

  # CRITICAL: Deny all S3 actions if not from trusted IP
  statement {
    sid       = "DenyAllIfNotFromTrustedIP"
    effect    = "Deny"
    actions   = ["s3:*"]
    resources = ["*"]

    condition {
      test     = "NotIpAddress"
      variable = "aws:SourceIp"
      values   = [var.third_party_ip]
    }
  }
}

resource "aws_iam_policy" "third_party_policy" {
  name        = "${var.project_name}-third-party-policy"
  description = "Restricted policy for third-party document uploads and downloads"
  policy      = data.aws_iam_policy_document.third_party_policy_doc.json
}

resource "aws_iam_user_policy_attachment" "third_party_attach" {
  user       = aws_iam_user.third_party_user.name
  policy_arn = aws_iam_policy.third_party_policy.arn
}

# ============================================
# Outputs
# ============================================
output "bucket_names" {
  description = "All created S3 bucket names"
  value       = { for k, v in aws_s3_bucket.doc_buckets : k => v.id }
}

output "bucket_arns" {
  description = "All created S3 bucket ARNs"
  value       = { for k, v in aws_s3_bucket.doc_buckets : k => v.arn }
}

output "third_party_iam_user_name" {
  description = "The IAM username for the third party"
  value       = aws_iam_user.third_party_user.name
}

output "third_party_iam_user_arn" {
  description = "The IAM user ARN for the third party"
  value       = aws_iam_user.third_party_user.arn
}

output "replication_role_arn" {
  description = "The IAM role ARN used for S3 replication"
  value       = aws_iam_role.replication_role.arn
}

output "region" {
  description = "The AWS region where resources are deployed"
  value       = var.aws_region
}

Step 3.3: Create outputs.tf

This file defines what information Terraform will display after deployment.

Create file: terraform/outputs.tf

output "deployment_summary" {
  description = "Summary of deployed infrastructure"
  value = {
    region                 = var.aws_region
    project_name           = var.project_name
    buckets_created        = length(aws_s3_bucket.doc_buckets)
    versioning_enabled     = var.versioning_enabled
    logging_enabled        = var.enable_logging
    replication_configured = true
  }
}

output "uploads_bucket" {
  description = "Upload bucket details (for third party)"
  value = {
    name   = aws_s3_bucket.doc_buckets[local.bucket_names.uploads].id
    arn    = aws_s3_bucket.doc_buckets[local.bucket_names.uploads].arn
    region = aws_s3_bucket.doc_buckets[local.bucket_names.uploads].region
  }
}

output "delivery_bucket" {
  description = "Delivery bucket details (for third party)"
  value = {
    name   = aws_s3_bucket.doc_buckets[local.bucket_names.delivery].id
    arn    = aws_s3_bucket.doc_buckets[local.bucket_names.delivery].arn
    region = aws_s3_bucket.doc_buckets[local.bucket_names.delivery].region
  }
}

output "third_party_instructions" {
  description = "Instructions for creating and using third party credentials"
  value = <<-EOT

  ═══════════════════════════════════════════════════════════════
  NEXT STEPS: Configure Third Party Access
  ═══════════════════════════════════════════════════════════════

  1. Create Access Keys:
     - Go to AWS Console → IAM → Users
     - Find user: ${aws_iam_user.third_party_user.name}
     - Click "Security credentials" tab
     - Click "Create access key" → Choose "Third-party service"
     - Save the Access Key ID and Secret Access Key securely

  2. Configure AWS CLI Profile:
     aws configure --profile third-party-test
     # Enter the access key and secret key from step 1
     # Default region: ${var.aws_region}
     # Default output: json

  3. Test Upload:
     echo "test data" > test.pdf
     aws s3 cp test.pdf s3://${aws_s3_bucket.doc_buckets[local.bucket_names.uploads].id}/ --profile third-party-test

  4. Verify Replication:
     Check the internal-processing bucket in console after ~5 minutes

  5. Test Download (after Phase 2 processing):
     aws s3 cp s3://${aws_s3_bucket.doc_buckets[local.bucket_names.delivery].id}/processed-test.pdf . --profile third-party-test

  ═══════════════════════════════════════════════════════════════
  EOT
}

Step 3.4: Create terraform.tfvars

This file contains your specific values for the variables.

Create file: terraform/terraform.tfvars

# AWS Configuration
aws_region   = "ap-south-1"
project_name = "secure-doc-pipeline"

# IMPORTANT: Replace with your actual IP address
# Find your IP at: https://whatismyipaddress.com/
# Format: "xxx.xxx.xxx.xxx/32"
third_party_ip = "YOUR_IP_ADDRESS/32"

# Feature Flags
enable_logging        = true
force_destroy_buckets = true  # Set to false for production
versioning_enabled    = true

⚠️ IMPORTANT: Before proceeding, update the third_party_ip value with your actual IP address!

Phase 4: Deploy the Infrastructure

Step 4.1: Initialize Terraform

This downloads the AWS provider and prepares your workspace:

cd terraform
terraform init

Expected output:

Initializing the backend...
Initializing provider plugins...
- Finding hashicorp/aws versions matching "~> 5.0"...
- Installing hashicorp/aws v5.x.x...

Terraform has been successfully initialized!

Step 4.2: Validate Configuration

Check for syntax errors:

terraform validate

Expected output:

Success! The configuration is valid.

Step 4.3: Preview Changes

See what Terraform will create:

terraform plan

Review the output carefully. You should see:

  • 5 S3 buckets
  • Multiple S3 bucket configurations (versioning, encryption, etc.)
  • 1 IAM role for replication
  • 1 IAM policy for replication
  • 1 IAM user for third party
  • 1 IAM policy for third party
  • 2 S3 replication configurations

Step 4.4: Apply Configuration

Deploy the infrastructure:

terraform apply

Type yes when prompted.

Deployment time: 2-3 minutes

Step 4.5: Save Outputs

After successful deployment, save the outputs:

terraform output > deployment-info.txt

Phase 5: Create Access Keys for Third Party User

  1. Log in to AWS Console
  2. Navigate to IAMUsers
  3. Find and click on secure-doc-pipeline-third-party-user
  4. Click the Security credentials tab
  5. Scroll to Access keys section
  6. Click Create access key
  7. Select use case: Third-party service
  8. Click Next, add optional description tag
  9. Click Create access key
  10. ⚠️ CRITICAL: Click Download .csv file and save it securely
  11. Store the credentials in a password manager

Via AWS CLI (Alternative)

aws iam create-access-key --user-name secure-doc-pipeline-third-party-user

# Save the output AccessKeyId and SecretAccessKey securely

Phase 6: Configure AWS CLI Profile for Testing

Create a separate AWS CLI profile for the third-party user:

aws configure --profile third-party-test

Enter the following when prompted:

AWS Access Key ID: [Enter Access Key from previous step]
AWS Secret Access Key: [Enter Secret Access Key from previous step]
Default region name: ap-south-1
Default output format: json

Verify Profile

aws sts get-caller-identity --profile third-party-test

Expected output:

{
  "UserId": "AIDAXXXXXXXXXXXXX",
  "Account": "123456789012",
  "Arn": "arn:aws:iam::123456789012:user/secure-doc-pipeline-third-party-user"
}

Phase 7: Test the Infrastructure

Test 1: Successful Upload to Uploads Bucket

# Create a test file
echo "This is a test document for the secure pipeline" > test-upload.pdf

# Upload to uploads bucket (should succeed)
aws s3 cp test-upload.pdf s3://secure-doc-pipeline-uploads/ --profile third-party-test

Expected output:

upload: ./test-upload.pdf to s3://secure-doc-pipeline-uploads/test-upload.pdf

Test 2: Verify S3 Replication

Wait 2-5 minutes for replication to complete, then check:

# List objects in internal-processing bucket using your admin profile
aws s3 ls s3://secure-doc-pipeline-internal-processing/

# Or check via console:
# Navigate to S3 → secure-doc-pipeline-internal-processing
# You should see test-upload.pdf

Test 3: Unauthorized Upload (Should Fail)

# Try to upload to delivery bucket (should be denied)
aws s3 cp test-upload.pdf s3://secure-doc-pipeline-delivery/ --profile third-party-test

Expected output:

upload failed: ./test-upload.pdf to s3://secure-doc-pipeline-delivery/test-upload.pdf
An error occurred (AccessDenied) when calling the PutObject operation: Access Denied

This is correct behavior! Third party should only upload to uploads bucket.

Test 4: Unauthorized Bucket Listing (Should Fail)

# Try to list all buckets (should be denied)
aws s3 ls --profile third-party-test

Expected output:

An error occurred (AccessDenied) when calling the ListBuckets operation: Access Denied

This is correct behavior! Third party has no permission to list buckets.

Test 5: Download from Delivery Bucket

# First, manually copy a file to delivery bucket using admin profile
aws s3 cp test-upload.pdf s3://secure-doc-pipeline-delivery/processed-test.pdf

# Now try to download with third-party profile (should succeed)
aws s3 cp s3://secure-doc-pipeline-delivery/processed-test.pdf ./downloaded-file.pdf --profile third-party-test

Expected output:

download: s3://secure-doc-pipeline-delivery/processed-test.pdf to ./downloaded-file.pdf

Test 6: List Delivery Bucket Contents

# Third party can list delivery bucket to see available files
aws s3 ls s3://secure-doc-pipeline-delivery/ --profile third-party-test

Expected output:

2025-10-16 10:30:45       48 processed-test.pdf

Phase 8: Verify Server Access Logging

Check that access logs are being generated:

# Wait 5-10 minutes after performing actions, then check
aws s3 ls s3://secure-doc-pipeline-compliance-logs/ --recursive

You should see log files organized by source bucket:

2025-10-16 11:00:00   1234 secure-doc-pipeline-uploads/2025-10-16-11-00-00-XXXXX
2025-10-16 11:01:00   987  secure-doc-pipeline-delivery/2025-10-16-11-01-00-XXXXX

Troubleshooting

Issue: Replication Not Working

Symptoms: Files uploaded to uploads bucket don’t appear in internal-processing bucket

Solutions:

  1. Check versioning is enabled:

    aws s3api get-bucket-versioning --bucket secure-doc-pipeline-uploads
    

    Should show: "Status": "Enabled"

  2. Check replication configuration:

    aws s3api get-bucket-replication --bucket secure-doc-pipeline-uploads
    
  3. Wait longer: Replication can take 2-15 minutes depending on file size

  4. Check replication status:

    aws s3api head-object --bucket secure-doc-pipeline-uploads --key test-upload.pdf
    

    Look for "ReplicationStatus": "COMPLETED" or "PENDING"

  5. Verify IAM role permissions:

    aws iam get-role --role-name secure-doc-pipeline-replication-role
    

Issue: Access Denied for Third Party User

Symptoms: Third party gets AccessDenied error even for allowed operations

Solutions:

  1. Verify IP address is correct:

  2. Update IP and redeploy:

    # Update terraform.tfvars with new IP
    terraform apply
    
  3. Check policy is attached:

    aws iam list-attached-user-policies --user-name secure-doc-pipeline-third-party-user
    
  4. Test from correct IP:

    • If using VPN, ensure VPN is connected
    • If testing from different location, IP will be different

Issue: Terraform Apply Fails

Common errors and solutions:

  1. Error: “BucketAlreadyExists”

    • Bucket names must be globally unique
    • Change project_name in terraform.tfvars to something unique
    • Example: secure-doc-pipeline-yourname-20251016
  2. Error: “Access Denied” during Terraform apply

    • Check AWS CLI is configured: aws sts get-caller-identity
    • Verify your AWS user has required permissions
    • Try with admin credentials for initial setup
  3. Error: “InvalidBucketName”

    • Bucket names can only contain lowercase letters, numbers, and hyphens
    • Must be 3-63 characters long
    • Cannot start or end with a hyphen

Issue: Can’t Delete Buckets

Symptoms: terraform destroy fails because buckets contain objects

Solution:

# Empty all buckets first
aws s3 rm s3://secure-doc-pipeline-uploads/ --recursive
aws s3 rm s3://secure-doc-pipeline-internal-processing/ --recursive
aws s3 rm s3://secure-doc-pipeline-processed-output/ --recursive
aws s3 rm s3://secure-doc-pipeline-delivery/ --recursive
aws s3 rm s3://secure-doc-pipeline-compliance-logs/ --recursive

# Then run destroy
terraform destroy

Security Best Practices

Implemented in This Phase

Encryption at rest (AES-256) ✅ Versioning enabled (recover from accidental deletions) ✅ Public access blocked (no public internet access) ✅ IP-based restrictions (third party can only access from trusted IP) ✅ Principle of least privilege (third party has minimal permissions) ✅ Server access logging (audit trail for compliance) ✅ IAM policy conditions (deny if not from trusted IP)

Additional Recommendations for Production

  1. Enable MFA for IAM user deletion
  2. Use AWS Organizations for account isolation
  3. Implement S3 Object Lock for regulatory compliance
  4. Enable AWS Config for continuous compliance monitoring
  5. Use VPC Endpoints for private S3 access (covered in Phase 3)
  6. Implement S3 Bucket Keys for encryption cost reduction (already done)
  7. Set up AWS CloudTrail for detailed API logging (covered in Phase 3)

Cost Estimation

Phase 1 Monthly Costs (Approximate)

S3 Storage (ap-south-1 pricing):

  • 5 GB storage: ~$0.12/month
  • Versioning overhead (2x): ~$0.12/month
  • Total Storage: ~$0.24/month

S3 Requests:

  • 1,000 PUT requests: ~$0.005
  • 1,000 GET requests: ~$0.0004
  • Total Requests: ~$0.01/month

S3 Replication:

  • Data transfer (same region): $0.00 (free)
  • Replication PUT requests: ~$0.005/1000 objects
  • Total Replication: ~$0.01/month

Server Access Logs:

  • Storage for logs: ~$0.05/month
  • Total Logging: ~$0.05/month

IAM:

  • IAM users, roles, policies: $0.00 (free)

Total Estimated Cost for Phase 1: ~$0.31/month

With 100 documents/month (avg 2 MB each):

  • Storage: ~$0.50/month
  • Requests: ~$0.10/month
  • Total: ~$0.60/month

Cost Optimization Tips

  1. Enable lifecycle policies (already configured):

    • Delete old versions after 90 days
    • Saves on storage costs
  2. Use S3 Intelligent-Tiering for infrequently accessed files:

    • Automatic cost savings for files not accessed after 30 days
  3. Set up AWS Budgets:

    # Create budget alert
    aws budgets create-budget --account-id YOUR_ACCOUNT_ID \
      --budget file://budget.json
    
  4. Clean up test files regularly:

    aws s3 rm s3://secure-doc-pipeline-uploads/ --recursive
    

Verification Checklist

Before moving to Phase 2, verify:

  • All 5 S3 buckets created successfully
  • Versioning enabled on all buckets
  • Encryption enabled on all buckets
  • Public access blocked on all buckets
  • Replication working: uploads → internal-processing
  • Replication working: processed-output → delivery
  • Third-party IAM user created
  • Access keys created for third-party user
  • Third-party can upload to uploads bucket
  • Third-party can download from delivery bucket
  • Third-party CANNOT upload to other buckets
  • Third-party CANNOT list all buckets
  • Server access logging enabled
  • Log files appearing in compliance-logs bucket

Next Steps

Congratulations! You’ve successfully built a secure, production-ready S3 infrastructure with:

  • 5 S3 buckets with proper security controls
  • Automatic cross-bucket replication
  • Restricted third-party access with IP filtering
  • Comprehensive logging for compliance

Ready for Phase 2?

Phase 2 will add the Lambda function to automatically process documents as they arrive in the internal-processing bucket.

Proceed to: AWS Secure Document Pipeline - Part 2: Lambda Function for Document Processing Here is the Part 2, where we’ll add the Lambda function to automatically process documents as they arrive in the internal-processing bucket!

Additional Resources


Table of Contents