Learn how to build a secure, production-grade S3-based document processing pipeline using Infrastructure as Code. Complete Terraform setup with 5 S3 buckets, automatic replication, and strict IAM policies.
Building a secure, scalable document processing pipeline is crucial for modern applications that handle sensitive documents. This comprehensive guide walks you through creating a production-grade S3-based document processing pipeline using Infrastructure as Code (Terraform). This foundation includes 5 S3 buckets with automatic replication, strict IAM policies, and comprehensive security controls.
Our secure document processing pipeline provides multiple layers of protection and automation:
┌─────────────────────────────────────────────────────────────────────┐
│ Third Party Client │
│ (IP-Restricted IAM User) │
└────────────┬──────────────────────────────────────┬─────────────────┘
│ Upload PDF │ Download Result
↓ ↑
┌─────────────────┐ ┌─────────────────┐
│ uploads bucket │ │ delivery bucket │
│ (Versioned + │ │ (Versioned + │
│ Encrypted) │ │ Encrypted) │
└────────┬────────┘ └────────┬────────┘
│ ↑
│ S3 Replication │ S3 Replication
↓ │
┌────────────────────────┐ ┌─────────────────────────┐
│ internal-processing │ │ processed-output bucket │
│ bucket │ │ (Versioned + │
│ (Versioned + │←──────────│ Encrypted) │
│ Encrypted) │ Lambda └─────────────────────────┘
└────────────────────────┘ Process
(Phase 2)
┌─────────────────────────────────────────────────────────────────┐
│ compliance-logs bucket │
│ (Receives server access logs from all buckets) │
└─────────────────────────────────────────────────────────────────┘
uploads bucket (restricted by IP address)internal-processing bucketprocessed-output bucket (Phase 2)delivery bucketdelivery bucketcompliance-logs bucketAWS CLI Installation:
Windows:
# Using MSI installer
# Download from: https://awscli.amazonaws.com/AWSCLIV2.msi
msiexec.exe /i https://awscli.amazonaws.com/AWSCLIV2.msi
Linux:
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
sudo ./aws/install
macOS:
curl "https://awscli.amazonaws.com/AWSCLIV2.pkg" -o "AWSCLIV2.pkg"
sudo installer -pkg AWSCLIV2.pkg -target /
Verify Installation:
aws --version
# Expected output: aws-cli/2.x.x Python/3.x.x...
Configure AWS CLI:
aws configure
# AWS Access Key ID: [Enter your access key]
# AWS Secret Access Key: [Enter your secret key]
# Default region name: ap-south-1
# Default output format: json
Terraform Installation:
Windows (using Chocolatey):
choco install terraform
Linux:
wget -O- https://apt.releases.hashicorp.com/gpg | sudo gpg --dearmor -o /usr/share/keyrings/hashicorp-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] https://apt.releases.hashicorp.com $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/hashicorp.list
sudo apt update && sudo apt install terraform
macOS:
brew tap hashicorp/tap
brew install hashicorp/tap/terraform
Verify Installation:
terraform version
# Expected output: Terraform v1.5.x or higher
Create a dedicated directory for this project:
# Navigate to your projects folder
cd /path/to/your/projects
# Create project directory
mkdir secure-doc-pipeline
cd secure-doc-pipeline
# Create Terraform directory
mkdir terraform
cd terraform
Your directory structure will look like:
secure-doc-pipeline/
└── terraform/
├── main.tf
├── variables.tf
├── outputs.tf
└── terraform.tfvars
variables.tfThis file defines all configurable parameters for the infrastructure.
Create file: terraform/variables.tf
variable "aws_region" {
description = "The AWS region to deploy resources in"
type = string
default = "ap-south-1"
}
variable "project_name" {
description = "A unique name for the project to prefix all resources"
type = string
default = "secure-doc-pipeline"
}
variable "third_party_ip" {
description = "The trusted IP address of the third party (CIDR notation)"
type = string
# IMPORTANT: Replace this with your actual IP address
# Find your IP at: https://whatismyipaddress.com/
# Add /32 at the end for a single IP
default = "YOUR_IP_ADDRESS/32"
}
variable "enable_logging" {
description = "Enable S3 server access logging for compliance"
type = bool
default = true
}
variable "force_destroy_buckets" {
description = "Allow buckets to be destroyed even if they contain objects (useful for dev/test)"
type = bool
default = true # Set to false for production
}
variable "versioning_enabled" {
description = "Enable versioning on all buckets"
type = bool
default = true
}
main.tfThis is the core infrastructure definition file.
Create file: terraform/main.tf
# ============================================
# Provider Configuration
# ============================================
terraform {
required_version = ">= 1.5.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
}
provider "aws" {
region = var.aws_region
default_tags {
tags = {
Project = var.project_name
ManagedBy = "Terraform"
Environment = "Development"
Purpose = "Secure Document Processing Pipeline"
}
}
}
# ============================================
# Local Variables
# ============================================
locals {
# Define all bucket names
bucket_names = {
uploads = "${var.project_name}-uploads"
internal_processing = "${var.project_name}-internal-processing"
processed_output = "${var.project_name}-processed-output"
delivery = "${var.project_name}-delivery"
compliance_logs = "${var.project_name}-compliance-logs"
}
# Buckets that will have replication enabled
replication_source_buckets = [
local.bucket_names.uploads,
local.bucket_names.processed_output
]
# Buckets that will send logs to compliance bucket
logged_buckets = [
local.bucket_names.uploads,
local.bucket_names.internal_processing,
local.bucket_names.processed_output,
local.bucket_names.delivery
]
}
# ============================================
# S3 Buckets Creation
# ============================================
resource "aws_s3_bucket" "doc_buckets" {
for_each = local.bucket_names
bucket = each.value
# Allow Terraform to destroy bucket even with objects (dev/test only)
force_destroy = var.force_destroy_buckets
tags = {
Name = each.value
Type = each.key
}
}
# ============================================
# S3 Bucket Versioning
# ============================================
resource "aws_s3_bucket_versioning" "versioning" {
for_each = aws_s3_bucket.doc_buckets
bucket = each.value.id
versioning_configuration {
status = var.versioning_enabled ? "Enabled" : "Suspended"
}
}
# ============================================
# S3 Bucket Encryption (AES256)
# ============================================
resource "aws_s3_bucket_server_side_encryption_configuration" "encryption" {
for_each = aws_s3_bucket.doc_buckets
bucket = each.value.id
rule {
apply_server_side_encryption_by_default {
sse_algorithm = "AES256"
}
bucket_key_enabled = true
}
}
# ============================================
# S3 Public Access Block (Security Best Practice)
# ============================================
resource "aws_s3_bucket_public_access_block" "pab" {
for_each = aws_s3_bucket.doc_buckets
bucket = each.value.id
block_public_acls = true
block_public_policy = true
ignore_public_acls = true
restrict_public_buckets = true
}
# ============================================
# S3 Bucket Lifecycle Rules (Cost Optimization)
# ============================================
resource "aws_s3_bucket_lifecycle_configuration" "lifecycle" {
for_each = {
for k,v in aws_s3_bucket.doc_buckets : k => v
if k != "compliance_logs"
}
bucket = each.value.id
rule {
id = "delete-old-versions"
status = "Enabled"
filter {} # Apply to all objects
noncurrent_version_expiration {
noncurrent_days = 90
}
}
rule {
id = "cleanup-incomplete-uploads"
status = "Enabled"
filter {} # Apply to all objects
abort_incomplete_multipart_upload {
days_after_initiation = 7
}
}
}
# ============================================
# S3 Server Access Logging
# ============================================
resource "aws_s3_bucket_logging" "access_logging" {
for_each = {
for bucket in local.logged_buckets : bucket => bucket
if var.enable_logging
}
bucket = each.value
target_bucket = aws_s3_bucket.doc_buckets[local.bucket_names.compliance_logs].id
target_prefix = "${each.value}/"
}
# ============================================
# IAM Role for S3 Replication
# ============================================
resource "aws_iam_role" "replication_role" {
name = "${var.project_name}-replication-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "s3.amazonaws.com"
}
}
]
})
tags = {
Name = "${var.project_name}-replication-role"
}
}
# ============================================
# IAM Policy for S3 Replication
# ============================================
resource "aws_iam_policy" "replication_policy" {
name = "${var.project_name}-replication-policy"
description = "Policy for S3 bucket replication"
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Sid = "AllowS3GetReplicationConfiguration"
Effect = "Allow"
Action = [
"s3:GetReplicationConfiguration",
"s3:ListBucket"
]
Resource = [
aws_s3_bucket.doc_buckets[local.bucket_names.uploads].arn,
aws_s3_bucket.doc_buckets[local.bucket_names.processed_output].arn
]
},
{
Sid = "AllowS3GetObjectVersions"
Effect = "Allow"
Action = [
"s3:GetObjectVersionForReplication",
"s3:GetObjectVersionAcl",
"s3:GetObjectVersionTagging"
]
Resource = [
"${aws_s3_bucket.doc_buckets[local.bucket_names.uploads].arn}/*",
"${aws_s3_bucket.doc_buckets[local.bucket_names.processed_output].arn}/*"
]
},
{
Sid = "AllowS3ReplicateObjects"
Effect = "Allow"
Action = [
"s3:ReplicateObject",
"s3:ReplicateDelete",
"s3:ReplicateTags"
]
Resource = [
"${aws_s3_bucket.doc_buckets[local.bucket_names.internal_processing].arn}/*",
"${aws_s3_bucket.doc_buckets[local.bucket_names.delivery].arn}/*"
]
}
]
})
}
# Attach policy to role
resource "aws_iam_role_policy_attachment" "replication_attach" {
role = aws_iam_role.replication_role.name
policy_arn = aws_iam_policy.replication_policy.arn
}
# ============================================
# S3 Replication: uploads → internal-processing
# ============================================
resource "aws_s3_bucket_replication_configuration" "uploads_to_internal" {
depends_on = [
aws_s3_bucket_versioning.versioning,
aws_iam_role_policy_attachment.replication_attach
]
role = aws_iam_role.replication_role.arn
bucket = aws_s3_bucket.doc_buckets[local.bucket_names.uploads].id
rule {
id = "ReplicateAllUploads"
status = "Enabled"
filter {} # Replicate all objects
destination {
bucket = aws_s3_bucket.doc_buckets[local.bucket_names.internal_processing].arn
storage_class = "STANDARD"
}
delete_marker_replication {
status = "Enabled"
}
}
}
# ============================================
# S3 Replication: processed-output → delivery
# ============================================
resource "aws_s3_bucket_replication_configuration" "processed_to_delivery" {
depends_on = [
aws_s3_bucket_versioning.versioning,
aws_iam_role_policy_attachment.replication_attach
]
role = aws_iam_role.replication_role.arn
bucket = aws_s3_bucket.doc_buckets[local.bucket_names.processed_output].id
rule {
id = "ReplicateProcessedFiles"
status = "Enabled"
filter {} # Replicate all objects
destination {
bucket = aws_s3_bucket.doc_buckets[local.bucket_names.delivery].arn
storage_class = "STANDARD"
}
delete_marker_replication {
status = "Enabled"
}
}
}
# ============================================
# IAM User for Third Party
# ============================================
resource "aws_iam_user" "third_party_user" {
name = "${var.project_name}-third-party-user"
tags = {
Name = "${var.project_name}-third-party-user"
Description = "Restricted IAM user for third-party document uploads and downloads"
}
}
# ============================================
# IAM Policy for Third Party User
# ============================================
data "aws_iam_policy_document" "third_party_policy_doc" {
# Allow uploads to uploads bucket
statement {
sid = "AllowUploadsToUploadsBucket"
effect = "Allow"
actions = [
"s3:PutObject",
"s3:PutObjectAcl"
]
resources = [
"${aws_s3_bucket.doc_buckets[local.bucket_names.uploads].arn}/*"
]
}
# Allow downloads from delivery bucket
statement {
sid = "AllowDownloadsFromDeliveryBucket"
effect = "Allow"
actions = [
"s3:GetObject",
"s3:GetObjectVersion"
]
resources = [
"${aws_s3_bucket.doc_buckets[local.bucket_names.delivery].arn}/*"
]
}
# Allow listing objects in delivery bucket (to see what's available)
statement {
sid = "AllowListDeliveryBucket"
effect = "Allow"
actions = [
"s3:ListBucket",
"s3:ListBucketVersions"
]
resources = [
aws_s3_bucket.doc_buckets[local.bucket_names.delivery].arn
]
}
# CRITICAL: Deny all S3 actions if not from trusted IP
statement {
sid = "DenyAllIfNotFromTrustedIP"
effect = "Deny"
actions = ["s3:*"]
resources = ["*"]
condition {
test = "NotIpAddress"
variable = "aws:SourceIp"
values = [var.third_party_ip]
}
}
}
resource "aws_iam_policy" "third_party_policy" {
name = "${var.project_name}-third-party-policy"
description = "Restricted policy for third-party document uploads and downloads"
policy = data.aws_iam_policy_document.third_party_policy_doc.json
}
resource "aws_iam_user_policy_attachment" "third_party_attach" {
user = aws_iam_user.third_party_user.name
policy_arn = aws_iam_policy.third_party_policy.arn
}
# ============================================
# Outputs
# ============================================
output "bucket_names" {
description = "All created S3 bucket names"
value = { for k, v in aws_s3_bucket.doc_buckets : k => v.id }
}
output "bucket_arns" {
description = "All created S3 bucket ARNs"
value = { for k, v in aws_s3_bucket.doc_buckets : k => v.arn }
}
output "third_party_iam_user_name" {
description = "The IAM username for the third party"
value = aws_iam_user.third_party_user.name
}
output "third_party_iam_user_arn" {
description = "The IAM user ARN for the third party"
value = aws_iam_user.third_party_user.arn
}
output "replication_role_arn" {
description = "The IAM role ARN used for S3 replication"
value = aws_iam_role.replication_role.arn
}
output "region" {
description = "The AWS region where resources are deployed"
value = var.aws_region
}
outputs.tfThis file defines what information Terraform will display after deployment.
Create file: terraform/outputs.tf
output "deployment_summary" {
description = "Summary of deployed infrastructure"
value = {
region = var.aws_region
project_name = var.project_name
buckets_created = length(aws_s3_bucket.doc_buckets)
versioning_enabled = var.versioning_enabled
logging_enabled = var.enable_logging
replication_configured = true
}
}
output "uploads_bucket" {
description = "Upload bucket details (for third party)"
value = {
name = aws_s3_bucket.doc_buckets[local.bucket_names.uploads].id
arn = aws_s3_bucket.doc_buckets[local.bucket_names.uploads].arn
region = aws_s3_bucket.doc_buckets[local.bucket_names.uploads].region
}
}
output "delivery_bucket" {
description = "Delivery bucket details (for third party)"
value = {
name = aws_s3_bucket.doc_buckets[local.bucket_names.delivery].id
arn = aws_s3_bucket.doc_buckets[local.bucket_names.delivery].arn
region = aws_s3_bucket.doc_buckets[local.bucket_names.delivery].region
}
}
output "third_party_instructions" {
description = "Instructions for creating and using third party credentials"
value = <<-EOT
═══════════════════════════════════════════════════════════════
NEXT STEPS: Configure Third Party Access
═══════════════════════════════════════════════════════════════
1. Create Access Keys:
- Go to AWS Console → IAM → Users
- Find user: ${aws_iam_user.third_party_user.name}
- Click "Security credentials" tab
- Click "Create access key" → Choose "Third-party service"
- Save the Access Key ID and Secret Access Key securely
2. Configure AWS CLI Profile:
aws configure --profile third-party-test
# Enter the access key and secret key from step 1
# Default region: ${var.aws_region}
# Default output: json
3. Test Upload:
echo "test data" > test.pdf
aws s3 cp test.pdf s3://${aws_s3_bucket.doc_buckets[local.bucket_names.uploads].id}/ --profile third-party-test
4. Verify Replication:
Check the internal-processing bucket in console after ~5 minutes
5. Test Download (after Phase 2 processing):
aws s3 cp s3://${aws_s3_bucket.doc_buckets[local.bucket_names.delivery].id}/processed-test.pdf . --profile third-party-test
═══════════════════════════════════════════════════════════════
EOT
}
terraform.tfvarsThis file contains your specific values for the variables.
Create file: terraform/terraform.tfvars
# AWS Configuration
aws_region = "ap-south-1"
project_name = "secure-doc-pipeline"
# IMPORTANT: Replace with your actual IP address
# Find your IP at: https://whatismyipaddress.com/
# Format: "xxx.xxx.xxx.xxx/32"
third_party_ip = "YOUR_IP_ADDRESS/32"
# Feature Flags
enable_logging = true
force_destroy_buckets = true # Set to false for production
versioning_enabled = true
⚠️ IMPORTANT: Before proceeding, update the third_party_ip value with your actual IP address!
This downloads the AWS provider and prepares your workspace:
cd terraform
terraform init
Expected output:
Initializing the backend...
Initializing provider plugins...
- Finding hashicorp/aws versions matching "~> 5.0"...
- Installing hashicorp/aws v5.x.x...
Terraform has been successfully initialized!
Check for syntax errors:
terraform validate
Expected output:
Success! The configuration is valid.
See what Terraform will create:
terraform plan
Review the output carefully. You should see:
Deploy the infrastructure:
terraform apply
Type yes when prompted.
Deployment time: 2-3 minutes
After successful deployment, save the outputs:
terraform output > deployment-info.txt
secure-doc-pipeline-third-party-useraws iam create-access-key --user-name secure-doc-pipeline-third-party-user
# Save the output AccessKeyId and SecretAccessKey securely
Create a separate AWS CLI profile for the third-party user:
aws configure --profile third-party-test
Enter the following when prompted:
AWS Access Key ID: [Enter Access Key from previous step]
AWS Secret Access Key: [Enter Secret Access Key from previous step]
Default region name: ap-south-1
Default output format: json
aws sts get-caller-identity --profile third-party-test
Expected output:
{
"UserId": "AIDAXXXXXXXXXXXXX",
"Account": "123456789012",
"Arn": "arn:aws:iam::123456789012:user/secure-doc-pipeline-third-party-user"
}
# Create a test file
echo "This is a test document for the secure pipeline" > test-upload.pdf
# Upload to uploads bucket (should succeed)
aws s3 cp test-upload.pdf s3://secure-doc-pipeline-uploads/ --profile third-party-test
Expected output:
upload: ./test-upload.pdf to s3://secure-doc-pipeline-uploads/test-upload.pdf
Wait 2-5 minutes for replication to complete, then check:
# List objects in internal-processing bucket using your admin profile
aws s3 ls s3://secure-doc-pipeline-internal-processing/
# Or check via console:
# Navigate to S3 → secure-doc-pipeline-internal-processing
# You should see test-upload.pdf
# Try to upload to delivery bucket (should be denied)
aws s3 cp test-upload.pdf s3://secure-doc-pipeline-delivery/ --profile third-party-test
Expected output:
upload failed: ./test-upload.pdf to s3://secure-doc-pipeline-delivery/test-upload.pdf
An error occurred (AccessDenied) when calling the PutObject operation: Access Denied
✅ This is correct behavior! Third party should only upload to uploads bucket.
# Try to list all buckets (should be denied)
aws s3 ls --profile third-party-test
Expected output:
An error occurred (AccessDenied) when calling the ListBuckets operation: Access Denied
✅ This is correct behavior! Third party has no permission to list buckets.
# First, manually copy a file to delivery bucket using admin profile
aws s3 cp test-upload.pdf s3://secure-doc-pipeline-delivery/processed-test.pdf
# Now try to download with third-party profile (should succeed)
aws s3 cp s3://secure-doc-pipeline-delivery/processed-test.pdf ./downloaded-file.pdf --profile third-party-test
Expected output:
download: s3://secure-doc-pipeline-delivery/processed-test.pdf to ./downloaded-file.pdf
# Third party can list delivery bucket to see available files
aws s3 ls s3://secure-doc-pipeline-delivery/ --profile third-party-test
Expected output:
2025-10-16 10:30:45 48 processed-test.pdf
Check that access logs are being generated:
# Wait 5-10 minutes after performing actions, then check
aws s3 ls s3://secure-doc-pipeline-compliance-logs/ --recursive
You should see log files organized by source bucket:
2025-10-16 11:00:00 1234 secure-doc-pipeline-uploads/2025-10-16-11-00-00-XXXXX
2025-10-16 11:01:00 987 secure-doc-pipeline-delivery/2025-10-16-11-01-00-XXXXX
Symptoms: Files uploaded to uploads bucket don’t appear in internal-processing bucket
Solutions:
Check versioning is enabled:
aws s3api get-bucket-versioning --bucket secure-doc-pipeline-uploads
Should show: "Status": "Enabled"
Check replication configuration:
aws s3api get-bucket-replication --bucket secure-doc-pipeline-uploads
Wait longer: Replication can take 2-15 minutes depending on file size
Check replication status:
aws s3api head-object --bucket secure-doc-pipeline-uploads --key test-upload.pdf
Look for "ReplicationStatus": "COMPLETED" or "PENDING"
Verify IAM role permissions:
aws iam get-role --role-name secure-doc-pipeline-replication-role
Symptoms: Third party gets AccessDenied error even for allowed operations
Solutions:
Verify IP address is correct:
terraform.tfvarsUpdate IP and redeploy:
# Update terraform.tfvars with new IP
terraform apply
Check policy is attached:
aws iam list-attached-user-policies --user-name secure-doc-pipeline-third-party-user
Test from correct IP:
Common errors and solutions:
Error: “BucketAlreadyExists”
project_name in terraform.tfvars to something uniquesecure-doc-pipeline-yourname-20251016Error: “Access Denied” during Terraform apply
aws sts get-caller-identityError: “InvalidBucketName”
Symptoms: terraform destroy fails because buckets contain objects
Solution:
# Empty all buckets first
aws s3 rm s3://secure-doc-pipeline-uploads/ --recursive
aws s3 rm s3://secure-doc-pipeline-internal-processing/ --recursive
aws s3 rm s3://secure-doc-pipeline-processed-output/ --recursive
aws s3 rm s3://secure-doc-pipeline-delivery/ --recursive
aws s3 rm s3://secure-doc-pipeline-compliance-logs/ --recursive
# Then run destroy
terraform destroy
✅ Encryption at rest (AES-256) ✅ Versioning enabled (recover from accidental deletions) ✅ Public access blocked (no public internet access) ✅ IP-based restrictions (third party can only access from trusted IP) ✅ Principle of least privilege (third party has minimal permissions) ✅ Server access logging (audit trail for compliance) ✅ IAM policy conditions (deny if not from trusted IP)
S3 Storage (ap-south-1 pricing):
S3 Requests:
S3 Replication:
Server Access Logs:
IAM:
With 100 documents/month (avg 2 MB each):
Enable lifecycle policies (already configured):
Use S3 Intelligent-Tiering for infrequently accessed files:
Set up AWS Budgets:
# Create budget alert
aws budgets create-budget --account-id YOUR_ACCOUNT_ID \
--budget file://budget.json
Clean up test files regularly:
aws s3 rm s3://secure-doc-pipeline-uploads/ --recursive
Before moving to Phase 2, verify:
Congratulations! You’ve successfully built a secure, production-ready S3 infrastructure with:
Phase 2 will add the Lambda function to automatically process documents as they arrive in the internal-processing bucket.
Proceed to: AWS Secure Document Pipeline - Part 2: Lambda Function for Document Processing Here is the Part 2, where we’ll add the Lambda function to automatically process documents as they arrive in the internal-processing bucket!