Documentation Added: - ARCHITECTURE_DECISIONS.md: Explains why systemd over k8s (TikTok display requirements) - DEPLOYMENT_CHECKLIST.md: Step-by-step deployment procedures - ROLLBACK_PROCEDURES.md: Emergency rollback and recovery procedures - test_production_deployment.py: Automated deployment verification script Key Documentation Highlights: - Detailed explanation of containerization limitations with browser automation - Complete deployment checklist with pre/post verification steps - Rollback scenarios with recovery time objectives - Emergency contact templates and backup procedures - Automated test script for production readiness 17 of 25 tasks completed (68% done) Remaining work focuses on spec compliance and testing 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
		
			
				
	
	
	
	
		
			8.1 KiB
		
	
	
	
	
	
	
	
			
		
		
	
	Rollback Procedures
Overview
This document provides step-by-step procedures for rolling back the HVAC Know It All Content Aggregator in case of deployment issues or system failures.
Risk Assessment
Severity Levels
- CRITICAL: System completely non-functional, no data collection
- HIGH: Major features broken, partial data loss
- MEDIUM: Some scrapers failing, degraded performance
- LOW: Minor issues, cosmetic problems
Pre-Rollback Checklist
Before Rolling Back
- 
Document the Issue - Screenshot error messages
- Save relevant log files
- Note exact time of failure
- Record affected components
 
- 
Attempt Quick Fixes - Check environment variables
- Verify network connectivity
- Restart failed service
- Check disk space
 
- 
Backup Current State # Backup current state before rollback sudo tar -czf /backup/emergency-$(date +%Y%m%d-%H%M%S).tar.gz \ /opt/hvac-kia-content/state/ \ /opt/hvac-kia-content/data/ \ /var/log/hvac-content/
Rollback Scenarios
Scenario 1: Service Won't Start
Symptoms: Systemd service fails to start after deployment
Quick Fix:
# Check service status
systemctl status hvac-content-aggregator.service
# Check journal for errors
journalctl -u hvac-content-aggregator.service -n 100
# Validate environment
cd /opt/hvac-kia-content
python3 -c "from run_production import validate_environment; validate_environment()"
Rollback Steps:
- 
Stop the timer: sudo systemctl stop hvac-content-aggregator.timer
- 
Revert to previous version: cd /opt/hvac-kia-content git fetch --tags git checkout v1.0.0 # Previous stable version
- 
Reinstall dependencies: pip install -r requirements.txt
- 
Restart service: sudo systemctl daemon-reload sudo systemctl start hvac-content-aggregator.timer
Scenario 2: Data Corruption
Symptoms: Malformed output, duplicate entries, missing data
Quick Fix:
# Check state files
ls -la /opt/hvac-kia-content/state/
# Validate JSON state files
python3 -c "import json; json.load(open('/opt/hvac-kia-content/state/youtube_state.json'))"
Rollback Steps:
- 
Stop all services: sudo systemctl stop hvac-content-aggregator.timer sudo systemctl stop hvac-tiktok-captions.timer
- 
Restore state from backup: # Find latest backup ls -lt /backup/hvac-state-*.tar.gz | head -1 # Restore state files cd / sudo tar -xzf /backup/hvac-state-20241217.tar.gz
- 
Clear corrupted output: # Move corrupted files to quarantine mkdir -p /opt/hvac-kia-content/quarantine mv /opt/hvac-kia-content/data/*_corrupted.md /opt/hvac-kia-content/quarantine/
- 
Restart services: sudo systemctl start hvac-content-aggregator.timer
Scenario 3: Performance Degradation
Symptoms: Slow execution, timeouts, high CPU/memory usage
Quick Fix:
# Check resource usage
top -p $(pgrep -f run_production.py)
# Check disk space
df -h /opt/hvac-kia-content
# Clear old logs
find /var/log/hvac-content -name "*.log" -mtime +7 -delete
Rollback Steps:
- 
Reduce scraper limits temporarily: # Edit production config nano /opt/hvac-kia-content/config/production.py # Reduce max_posts, max_videos, etc.
- 
Disable problematic scrapers: # In config/production.py SCRAPERS_CONFIG = { "instagram": { "enabled": False, # Temporarily disable ... } }
- 
Restart with reduced load: sudo systemctl restart hvac-content-aggregator.service
Scenario 4: Complete System Failure
Symptoms: Nothing works, multiple component failures
Full System Rollback:
- 
Stop Everything: # Stop all timers and services sudo systemctl stop hvac-content-aggregator.timer sudo systemctl stop hvac-tiktok-captions.timer sudo systemctl disable hvac-content-aggregator.timer sudo systemctl disable hvac-tiktok-captions.timer
- 
Backup Current State: # Full backup before rollback sudo tar -czf /backup/full-backup-$(date +%Y%m%d-%H%M%S).tar.gz \ /opt/hvac-kia-content/ \ /etc/systemd/system/hvac-*.{service,timer} \ /var/log/hvac-content/
- 
Clean Installation: # Remove current installation sudo rm -rf /opt/hvac-kia-content sudo rm -f /etc/systemd/system/hvac-* # Clone stable version cd /opt sudo git clone https://github.com/yourusername/hvac-kia-content.git cd hvac-kia-content sudo git checkout v1.0.0 # Last known stable # Restore configuration sudo cp /backup/.env /opt/hvac-kia-content/ # Set permissions sudo chown -R $USER:$USER /opt/hvac-kia-content
- 
Reinstall Services: cd /opt/hvac-kia-content ./install_production.sh
- 
Restore State (Optional): # Only if state is not corrupted sudo tar -xzf /backup/hvac-state-latest.tar.gz -C /
- 
Verify and Start: # Test first python3 run_production.py --dry-run # If successful, enable services sudo systemctl enable hvac-content-aggregator.timer sudo systemctl start hvac-content-aggregator.timer
Post-Rollback Verification
Immediate Checks
- Services are running:
systemctl status hvac-content-aggregator.timer
- No errors in logs:
tail -n 100 /var/log/hvac-content/aggregator.log | grep ERROR
- Test run successful:
cd /opt/hvac-kia-content python3 test_real_data.py --source youtube --items 1
1-Hour Verification
- Timer fired as scheduled
- All scrapers executed
- Output files generated
- NAS sync completed
- No memory leaks
- CPU usage normal
24-Hour Verification
- System stable
- No missed schedules
- Data quality good
- No duplicate entries
- Incremental updates working
Emergency Contacts
Technical Support
- Primary Contact: [Name] - [Phone] - [Email]
- Secondary Contact: [Name] - [Phone] - [Email]
- Escalation: [Manager Name] - [Phone] - [Email]
System Access
- Server: production-scraper.example.com
- SSH: ssh user@production-scraper.example.com
- Logs: /var/log/hvac-content/
- Config: /opt/hvac-kia-content/.env
Recovery Time Objectives
| Scenario | Target Recovery Time | Maximum Data Loss | 
|---|---|---|
| Service Restart | 5 minutes | None | 
| Version Rollback | 15 minutes | Since last backup | 
| State Restoration | 30 minutes | 24 hours | 
| Complete Rebuild | 1 hour | 48 hours | 
Lessons Learned Log
Previous Incidents
Document any rollbacks performed and lessons learned:
| Date | Issue | Resolution | Prevention | 
|---|---|---|---|
Backup Schedule
Automated Backups
# Add to crontab
0 2 * * * /opt/hvac-kia-content/scripts/backup.sh
Backup Script
#!/bin/bash
# /opt/hvac-kia-content/scripts/backup.sh
BACKUP_DIR="/backup/hvac-content"
DATE=$(date +%Y%m%d)
RETENTION_DAYS=30
# Create backup
tar -czf "$BACKUP_DIR/state-$DATE.tar.gz" /opt/hvac-kia-content/state/
tar -czf "$BACKUP_DIR/config-$DATE.tar.gz" /opt/hvac-kia-content/.env
# Clean old backups
find "$BACKUP_DIR" -name "*.tar.gz" -mtime +$RETENTION_DAYS -delete
# Verify backup
tar -tzf "$BACKUP_DIR/state-$DATE.tar.gz" > /dev/null 2>&1
if [ $? -eq 0 ]; then
    echo "Backup successful: $DATE"
else
    echo "Backup failed: $DATE" | mail -s "HVAC Backup Failed" alerts@example.com
fi
Testing Rollback Procedures
Monthly Drill
- Schedule maintenance window
- Perform controlled rollback
- Verify recovery procedures
- Document any issues
- Update procedures as needed
Test Checklist
- Backup procedures work
- Rollback completes in target time
- Data integrity maintained
- Services restart properly
- Monitoring alerts fire
- Documentation is current
Last Updated: 2024-12-18 Version: 1.0 Next Review: 2025-01-18