Learn how to implement production-grade DNS failover with AWS Route 53 for high availability websites. Complete guide with health checks, failover routing, and disaster recovery strategies.
In today’s digital landscape, website downtime can result in significant revenue loss, damaged reputation, and poor user experience. AWS Route 53 DNS failover provides a robust, automated solution for maintaining high availability by automatically redirecting traffic to backup resources when primary resources become unavailable.
This comprehensive guide will walk you through implementing a production-grade DNS failover system using Route 53, ensuring your website remains accessible even during outages, maintenance, or disasters.
Our DNS failover architecture implements a robust, self-healing infrastructure pattern:
User Request → Route 53 → Health Check → Primary Site (Healthy) / Secondary Site (Unhealthy)
The secondary site serves as your disaster recovery solution. Key considerations:
Step 1: Navigate to S3 Console
Step 2: Configure Bucket Settings
your-project-failover-site (must be globally unique)us-east-1, choose eu-west-1)Step 3: Create Maintenance Page
Create a professional maintenance page (maintenance.html):
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Site Maintenance - We'll Be Back Soon</title>
<style>
* {
margin: 0;
padding: 0;
box-sizing: border-box;
}
body {
font-family: "Segoe UI", Tahoma, Geneva, Verdana, sans-serif;
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
min-height: 100vh;
display: flex;
align-items: center;
justify-content: center;
color: white;
}
.maintenance-container {
text-align: center;
max-width: 600px;
padding: 2rem;
background: rgba(255, 255, 255, 0.1);
border-radius: 20px;
backdrop-filter: blur(10px);
box-shadow: 0 8px 32px rgba(0, 0, 0, 0.3);
}
.maintenance-icon {
font-size: 4rem;
margin-bottom: 1rem;
animation: pulse 2s infinite;
}
@keyframes pulse {
0% {
transform: scale(1);
}
50% {
transform: scale(1.1);
}
100% {
transform: scale(1);
}
}
h1 {
font-size: 2.5rem;
margin-bottom: 1rem;
font-weight: 300;
}
.subtitle {
font-size: 1.2rem;
margin-bottom: 2rem;
opacity: 0.9;
}
.message {
font-size: 1.1rem;
line-height: 1.6;
margin-bottom: 2rem;
}
.contact-info {
background: rgba(255, 255, 255, 0.1);
padding: 1.5rem;
border-radius: 10px;
margin-top: 2rem;
}
.contact-info h3 {
margin-bottom: 1rem;
font-size: 1.3rem;
}
.contact-info p {
margin-bottom: 0.5rem;
}
.status-indicator {
display: inline-block;
width: 12px;
height: 12px;
background: #ff6b6b;
border-radius: 50%;
margin-right: 8px;
animation: blink 1s infinite;
}
@keyframes blink {
0%,
50% {
opacity: 1;
}
51%,
100% {
opacity: 0.3;
}
}
.footer {
margin-top: 2rem;
font-size: 0.9rem;
opacity: 0.7;
}
</style>
</head>
<body>
<div class="maintenance-container">
<div class="maintenance-icon">🔧</div>
<h1>Site Maintenance</h1>
<p class="subtitle">We're working hard to improve your experience</p>
<div class="message">
<p>
Our website is currently undergoing scheduled maintenance to enhance
performance and add new features.
</p>
<p>We apologize for any inconvenience and appreciate your patience.</p>
</div>
<div class="contact-info">
<h3>Need Immediate Assistance?</h3>
<p><strong>Email:</strong> support@yourdomain.com</p>
<p><strong>Phone:</strong> +1 (555) 123-4567</p>
<p>
<strong>Status:</strong>
<span class="status-indicator"></span> Maintenance Mode
</p>
</div>
<div class="footer">
<p>
Expected completion: 2-4 hours | Last updated:
<span id="timestamp"></span>
</p>
</div>
</div>
<script>
// Update timestamp
document.getElementById("timestamp").textContent =
new Date().toLocaleString();
</script>
</body>
</html>
Step 4: Upload and Configure Static Website Hosting
maintenance.html to your failover bucketmaintenance.htmlExpected Result: You’ll have a URL like http://your-project-failover-site.s3-website.eu-west-1.amazonaws.com
Route 53 health checks are the monitoring system that determines when to failover.
Step 1: Navigate to Route 53 Console
Step 2: Basic Configuration
primary-site-health-checkd1234abcd.cloudfront.net)Step 3: Advanced Configuration
For production environments, configure these settings:
Step 4: String Matching (Optional)
For more precise health checking:
Step 5: Create Health Check
Understanding Health Check Status:
Monitoring Commands:
# Check health check status via AWS CLI
aws route53 get-health-check --health-check-id YOUR_HEALTH_CHECK_ID
# Monitor health check over time
aws route53 list-health-checks --query 'HealthChecks[?Id==`YOUR_HEALTH_CHECK_ID`]'
Step 1: Access Your Hosted Zone
Step 2: Update Primary Record
primary-site-health-checkprimary-cloudfront-siteStep 1: Create Secondary A Record
wwwStep 2: Configure Failover Settings
secondary-maintenance-siteExpected Result: You should now have two A records for the same domain:
Verification Commands:
# Check DNS records
dig yourdomain.com
nslookup yourdomain.com
# Check health check status
aws route53 get-health-check --health-check-id YOUR_HEALTH_CHECK_ID
Method 1: Force Health Check Failure
primary-site-health-checkxyz-fail-test-123)Expected Behavior:
Method 2: Temporarily Disable CloudFront
To restore normal operation:
# Check current DNS resolution
dig yourdomain.com
nslookup yourdomain.com
# Test from different locations
curl -I https://yourdomain.com
curl -I http://yourdomain.com
# Check health check status
aws route53 get-health-check --health-check-id YOUR_HEALTH_CHECK_ID
Monthly Cost Breakdown:
Cost Optimization Tips:
Health Check Optimization:
# Configure health check with optimal settings
aws route53 update-health-check \
--health-check-id YOUR_HEALTH_CHECK_ID \
--request-interval 30 \
--failure-threshold 3 \
--request-timeout 4
DNS Performance:
Health Check Security:
Failover Security:
Set Up Health Check Alarms:
# Create CloudWatch alarm for health check failures
aws cloudwatch put-metric-alarm \
--alarm-name "Route53-HealthCheck-Failure" \
--alarm-description "Alert when primary site health check fails" \
--metric-name HealthCheckStatus \
--namespace AWS/Route53 \
--statistic Minimum \
--period 60 \
--threshold 1 \
--comparison-operator LessThanThreshold \
--dimensions Name=HealthCheckId,Value=YOUR_HEALTH_CHECK_ID
Create SNS Topic for Alerts:
# Create SNS topic
aws sns create-topic --name "DNS-Failover-Alerts"
# Subscribe email to topic
aws sns subscribe \
--topic-arn "arn:aws:sns:region:account:topic-name" \
--protocol email \
--notification-endpoint "admin@yourdomain.com"
Create CloudWatch Dashboard:
{
"widgets": [
{
"type": "metric",
"properties": {
"metrics": [
[
"AWS/Route53",
"HealthCheckStatus",
"HealthCheckId",
"YOUR_HEALTH_CHECK_ID"
]
],
"period": 300,
"stat": "Average",
"region": "us-east-1",
"title": "Primary Site Health Check"
}
}
]
}
For Global Applications:
Implementation:
# Create additional health checks for different regions
aws route53 create-health-check \
--caller-reference "primary-site-$(date +%s)" \
--health-check-config '{
"Type": "HTTPS",
"ResourcePath": "/",
"FullyQualifiedDomainName": "d1234abcd.cloudfront.net",
"RequestInterval": 30,
"FailureThreshold": 3
}'
Combine Multiple Routing Policies:
Route 53 Geographic Routing:
# Create geographic routing with failover
aws route53 change-resource-record-sets \
--hosted-zone-id YOUR_HOSTED_ZONE_ID \
--change-batch '{
"Changes": [{
"Action": "CREATE",
"ResourceRecordSet": {
"Name": "yourdomain.com",
"Type": "A",
"SetIdentifier": "US-Primary",
"GeoLocation": {
"CountryCode": "US"
},
"AliasTarget": {
"DNSName": "d1234abcd.cloudfront.net",
"EvaluateTargetHealth": true,
"HostedZoneId": "Z2FDTNDATAQYW2"
}
}
}]
}'
Automated Recovery:
Manual Recovery Steps:
Content Backup:
# Backup primary site content
aws s3 sync s3://primary-bucket/ s3://backup-bucket/ --delete
# Backup Route 53 configuration
aws route53 get-hosted-zone --id YOUR_HOSTED_ZONE_ID > route53-backup.json
Configuration Backup:
Health Check Not Failing Over:
Problem: Health check shows unhealthy but DNS doesn’t switch to secondary
Solutions:
# Check record status
aws route53 get-change --id YOUR_CHANGE_ID
# Verify health check configuration
aws route53 get-health-check --health-check-id YOUR_HEALTH_CHECK_ID
False Positive Health Check Failures:
Problem: Health check fails but primary site is actually working
Solutions:
DNS Propagation Issues:
Problem: Changes not visible globally
Solutions:
Health Check Debugging:
# Test health check endpoint
curl -I https://d1234abcd.cloudfront.net
# Check health check metrics
aws cloudwatch get-metric-statistics \
--namespace AWS/Route53 \
--metric-name HealthCheckStatus \
--dimensions Name=HealthCheckId,Value=YOUR_HEALTH_CHECK_ID \
--start-time 2024-01-01T00:00:00Z \
--end-time 2024-01-02T00:00:00Z \
--period 300 \
--statistics Average
DNS Debugging:
# Check DNS resolution from different locations
dig @8.8.8.8 yourdomain.com
dig @1.1.1.1 yourdomain.com
# Check specific record types
dig yourdomain.com A
dig yourdomain.com AAAA
Key Metrics to Monitor:
Monitoring Commands:
# Monitor health check over time
aws route53 get-health-check --health-check-id YOUR_HEALTH_CHECK_ID --query 'HealthCheck.HealthCheckConfig'
# Check DNS performance
time nslookup yourdomain.com
time dig yourdomain.com
Important: Follow this exact order to avoid dependency issues.
Phase 1: Delete Route 53 Records
Delete Secondary Record:
aws route53 change-resource-record-sets \
--hosted-zone-id YOUR_HOSTED_ZONE_ID \
--change-batch '{
"Changes": [{
"Action": "DELETE",
"ResourceRecordSet": {
"Name": "yourdomain.com",
"Type": "A",
"SetIdentifier": "secondary-maintenance-site"
}
}]
}'
Revert Primary Record to Simple Routing:
Phase 2: Delete Health Check
# Delete health check
aws route53 delete-health-check --health-check-id YOUR_HEALTH_CHECK_ID
Phase 3: Delete Secondary S3 Bucket
Empty Bucket:
aws s3 rm s3://your-project-failover-site --recursive
Delete Bucket:
aws s3 rb s3://your-project-failover-site
Monthly Savings After Cleanup:
Verify Cleanup:
# Check no health checks remain
aws route53 list-health-checks
# Check no secondary records
aws route53 list-resource-record-sets --hosted-zone-id YOUR_HOSTED_ZONE_ID
# Verify primary site still works
curl -I https://yourdomain.com
Optimal Settings for Production:
TTL Settings:
Essential Alerts:
Health Check Security:
Failover Security:
Monthly Cost Management:
Cost Monitoring:
# Set up billing alarm
aws cloudwatch put-metric-alarm \
--alarm-name "Monthly-Billing-Alert" \
--alarm-description "Alert when monthly charges exceed $10" \
--metric-name EstimatedCharges \
--namespace AWS/Billing \
--statistic Maximum \
--period 86400 \
--threshold 10 \
--comparison-operator GreaterThanThreshold
Three-Tier Architecture:
Global DNS Routing:
Beyond DNS Failover:
Implementing Route 53 DNS failover provides a robust, automated solution for maintaining high availability of your website. This production-grade setup ensures:
This DNS failover implementation provides enterprise-grade reliability for your website, ensuring your users always have access to your content, even during unexpected outages or maintenance periods.
For questions or advanced configurations, refer to the AWS Route 53 documentation or consult with your DevOps team for custom implementations.