Documentation Added: - ARCHITECTURE_DECISIONS.md: Explains why systemd over k8s (TikTok display requirements) - DEPLOYMENT_CHECKLIST.md: Step-by-step deployment procedures - ROLLBACK_PROCEDURES.md: Emergency rollback and recovery procedures - test_production_deployment.py: Automated deployment verification script Key Documentation Highlights: - Detailed explanation of containerization limitations with browser automation - Complete deployment checklist with pre/post verification steps - Rollback scenarios with recovery time objectives - Emergency contact templates and backup procedures - Automated test script for production readiness 17 of 25 tasks completed (68% done) Remaining work focuses on spec compliance and testing 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
341 lines
No EOL
8.1 KiB
Markdown
341 lines
No EOL
8.1 KiB
Markdown
# Rollback Procedures
|
|
|
|
## Overview
|
|
This document provides step-by-step procedures for rolling back the HVAC Know It All Content Aggregator in case of deployment issues or system failures.
|
|
|
|
## Risk Assessment
|
|
|
|
### Severity Levels
|
|
- **CRITICAL**: System completely non-functional, no data collection
|
|
- **HIGH**: Major features broken, partial data loss
|
|
- **MEDIUM**: Some scrapers failing, degraded performance
|
|
- **LOW**: Minor issues, cosmetic problems
|
|
|
|
## Pre-Rollback Checklist
|
|
|
|
### Before Rolling Back
|
|
1. **Document the Issue**
|
|
- [ ] Screenshot error messages
|
|
- [ ] Save relevant log files
|
|
- [ ] Note exact time of failure
|
|
- [ ] Record affected components
|
|
|
|
2. **Attempt Quick Fixes**
|
|
- [ ] Check environment variables
|
|
- [ ] Verify network connectivity
|
|
- [ ] Restart failed service
|
|
- [ ] Check disk space
|
|
|
|
3. **Backup Current State**
|
|
```bash
|
|
# Backup current state before rollback
|
|
sudo tar -czf /backup/emergency-$(date +%Y%m%d-%H%M%S).tar.gz \
|
|
/opt/hvac-kia-content/state/ \
|
|
/opt/hvac-kia-content/data/ \
|
|
/var/log/hvac-content/
|
|
```
|
|
|
|
## Rollback Scenarios
|
|
|
|
### Scenario 1: Service Won't Start
|
|
**Symptoms:** Systemd service fails to start after deployment
|
|
|
|
**Quick Fix:**
|
|
```bash
|
|
# Check service status
|
|
systemctl status hvac-content-aggregator.service
|
|
|
|
# Check journal for errors
|
|
journalctl -u hvac-content-aggregator.service -n 100
|
|
|
|
# Validate environment
|
|
cd /opt/hvac-kia-content
|
|
python3 -c "from run_production import validate_environment; validate_environment()"
|
|
```
|
|
|
|
**Rollback Steps:**
|
|
1. Stop the timer:
|
|
```bash
|
|
sudo systemctl stop hvac-content-aggregator.timer
|
|
```
|
|
|
|
2. Revert to previous version:
|
|
```bash
|
|
cd /opt/hvac-kia-content
|
|
git fetch --tags
|
|
git checkout v1.0.0 # Previous stable version
|
|
```
|
|
|
|
3. Reinstall dependencies:
|
|
```bash
|
|
pip install -r requirements.txt
|
|
```
|
|
|
|
4. Restart service:
|
|
```bash
|
|
sudo systemctl daemon-reload
|
|
sudo systemctl start hvac-content-aggregator.timer
|
|
```
|
|
|
|
### Scenario 2: Data Corruption
|
|
**Symptoms:** Malformed output, duplicate entries, missing data
|
|
|
|
**Quick Fix:**
|
|
```bash
|
|
# Check state files
|
|
ls -la /opt/hvac-kia-content/state/
|
|
|
|
# Validate JSON state files
|
|
python3 -c "import json; json.load(open('/opt/hvac-kia-content/state/youtube_state.json'))"
|
|
```
|
|
|
|
**Rollback Steps:**
|
|
1. Stop all services:
|
|
```bash
|
|
sudo systemctl stop hvac-content-aggregator.timer
|
|
sudo systemctl stop hvac-tiktok-captions.timer
|
|
```
|
|
|
|
2. Restore state from backup:
|
|
```bash
|
|
# Find latest backup
|
|
ls -lt /backup/hvac-state-*.tar.gz | head -1
|
|
|
|
# Restore state files
|
|
cd /
|
|
sudo tar -xzf /backup/hvac-state-20241217.tar.gz
|
|
```
|
|
|
|
3. Clear corrupted output:
|
|
```bash
|
|
# Move corrupted files to quarantine
|
|
mkdir -p /opt/hvac-kia-content/quarantine
|
|
mv /opt/hvac-kia-content/data/*_corrupted.md /opt/hvac-kia-content/quarantine/
|
|
```
|
|
|
|
4. Restart services:
|
|
```bash
|
|
sudo systemctl start hvac-content-aggregator.timer
|
|
```
|
|
|
|
### Scenario 3: Performance Degradation
|
|
**Symptoms:** Slow execution, timeouts, high CPU/memory usage
|
|
|
|
**Quick Fix:**
|
|
```bash
|
|
# Check resource usage
|
|
top -p $(pgrep -f run_production.py)
|
|
|
|
# Check disk space
|
|
df -h /opt/hvac-kia-content
|
|
|
|
# Clear old logs
|
|
find /var/log/hvac-content -name "*.log" -mtime +7 -delete
|
|
```
|
|
|
|
**Rollback Steps:**
|
|
1. Reduce scraper limits temporarily:
|
|
```bash
|
|
# Edit production config
|
|
nano /opt/hvac-kia-content/config/production.py
|
|
# Reduce max_posts, max_videos, etc.
|
|
```
|
|
|
|
2. Disable problematic scrapers:
|
|
```python
|
|
# In config/production.py
|
|
SCRAPERS_CONFIG = {
|
|
"instagram": {
|
|
"enabled": False, # Temporarily disable
|
|
...
|
|
}
|
|
}
|
|
```
|
|
|
|
3. Restart with reduced load:
|
|
```bash
|
|
sudo systemctl restart hvac-content-aggregator.service
|
|
```
|
|
|
|
### Scenario 4: Complete System Failure
|
|
**Symptoms:** Nothing works, multiple component failures
|
|
|
|
**Full System Rollback:**
|
|
|
|
1. **Stop Everything:**
|
|
```bash
|
|
# Stop all timers and services
|
|
sudo systemctl stop hvac-content-aggregator.timer
|
|
sudo systemctl stop hvac-tiktok-captions.timer
|
|
sudo systemctl disable hvac-content-aggregator.timer
|
|
sudo systemctl disable hvac-tiktok-captions.timer
|
|
```
|
|
|
|
2. **Backup Current State:**
|
|
```bash
|
|
# Full backup before rollback
|
|
sudo tar -czf /backup/full-backup-$(date +%Y%m%d-%H%M%S).tar.gz \
|
|
/opt/hvac-kia-content/ \
|
|
/etc/systemd/system/hvac-*.{service,timer} \
|
|
/var/log/hvac-content/
|
|
```
|
|
|
|
3. **Clean Installation:**
|
|
```bash
|
|
# Remove current installation
|
|
sudo rm -rf /opt/hvac-kia-content
|
|
sudo rm -f /etc/systemd/system/hvac-*
|
|
|
|
# Clone stable version
|
|
cd /opt
|
|
sudo git clone https://github.com/yourusername/hvac-kia-content.git
|
|
cd hvac-kia-content
|
|
sudo git checkout v1.0.0 # Last known stable
|
|
|
|
# Restore configuration
|
|
sudo cp /backup/.env /opt/hvac-kia-content/
|
|
|
|
# Set permissions
|
|
sudo chown -R $USER:$USER /opt/hvac-kia-content
|
|
```
|
|
|
|
4. **Reinstall Services:**
|
|
```bash
|
|
cd /opt/hvac-kia-content
|
|
./install_production.sh
|
|
```
|
|
|
|
5. **Restore State (Optional):**
|
|
```bash
|
|
# Only if state is not corrupted
|
|
sudo tar -xzf /backup/hvac-state-latest.tar.gz -C /
|
|
```
|
|
|
|
6. **Verify and Start:**
|
|
```bash
|
|
# Test first
|
|
python3 run_production.py --dry-run
|
|
|
|
# If successful, enable services
|
|
sudo systemctl enable hvac-content-aggregator.timer
|
|
sudo systemctl start hvac-content-aggregator.timer
|
|
```
|
|
|
|
## Post-Rollback Verification
|
|
|
|
### Immediate Checks
|
|
- [ ] Services are running:
|
|
```bash
|
|
systemctl status hvac-content-aggregator.timer
|
|
```
|
|
- [ ] No errors in logs:
|
|
```bash
|
|
tail -n 100 /var/log/hvac-content/aggregator.log | grep ERROR
|
|
```
|
|
- [ ] Test run successful:
|
|
```bash
|
|
cd /opt/hvac-kia-content
|
|
python3 test_real_data.py --source youtube --items 1
|
|
```
|
|
|
|
### 1-Hour Verification
|
|
- [ ] Timer fired as scheduled
|
|
- [ ] All scrapers executed
|
|
- [ ] Output files generated
|
|
- [ ] NAS sync completed
|
|
- [ ] No memory leaks
|
|
- [ ] CPU usage normal
|
|
|
|
### 24-Hour Verification
|
|
- [ ] System stable
|
|
- [ ] No missed schedules
|
|
- [ ] Data quality good
|
|
- [ ] No duplicate entries
|
|
- [ ] Incremental updates working
|
|
|
|
## Emergency Contacts
|
|
|
|
### Technical Support
|
|
- **Primary Contact:** [Name] - [Phone] - [Email]
|
|
- **Secondary Contact:** [Name] - [Phone] - [Email]
|
|
- **Escalation:** [Manager Name] - [Phone] - [Email]
|
|
|
|
### System Access
|
|
- **Server:** production-scraper.example.com
|
|
- **SSH:** `ssh user@production-scraper.example.com`
|
|
- **Logs:** `/var/log/hvac-content/`
|
|
- **Config:** `/opt/hvac-kia-content/.env`
|
|
|
|
## Recovery Time Objectives
|
|
|
|
| Scenario | Target Recovery Time | Maximum Data Loss |
|
|
|----------|---------------------|-------------------|
|
|
| Service Restart | 5 minutes | None |
|
|
| Version Rollback | 15 minutes | Since last backup |
|
|
| State Restoration | 30 minutes | 24 hours |
|
|
| Complete Rebuild | 1 hour | 48 hours |
|
|
|
|
## Lessons Learned Log
|
|
|
|
### Previous Incidents
|
|
Document any rollbacks performed and lessons learned:
|
|
|
|
| Date | Issue | Resolution | Prevention |
|
|
|------|-------|------------|------------|
|
|
| | | | |
|
|
|
|
## Backup Schedule
|
|
|
|
### Automated Backups
|
|
```bash
|
|
# Add to crontab
|
|
0 2 * * * /opt/hvac-kia-content/scripts/backup.sh
|
|
```
|
|
|
|
### Backup Script
|
|
```bash
|
|
#!/bin/bash
|
|
# /opt/hvac-kia-content/scripts/backup.sh
|
|
|
|
BACKUP_DIR="/backup/hvac-content"
|
|
DATE=$(date +%Y%m%d)
|
|
RETENTION_DAYS=30
|
|
|
|
# Create backup
|
|
tar -czf "$BACKUP_DIR/state-$DATE.tar.gz" /opt/hvac-kia-content/state/
|
|
tar -czf "$BACKUP_DIR/config-$DATE.tar.gz" /opt/hvac-kia-content/.env
|
|
|
|
# Clean old backups
|
|
find "$BACKUP_DIR" -name "*.tar.gz" -mtime +$RETENTION_DAYS -delete
|
|
|
|
# Verify backup
|
|
tar -tzf "$BACKUP_DIR/state-$DATE.tar.gz" > /dev/null 2>&1
|
|
if [ $? -eq 0 ]; then
|
|
echo "Backup successful: $DATE"
|
|
else
|
|
echo "Backup failed: $DATE" | mail -s "HVAC Backup Failed" alerts@example.com
|
|
fi
|
|
```
|
|
|
|
## Testing Rollback Procedures
|
|
|
|
### Monthly Drill
|
|
1. Schedule maintenance window
|
|
2. Perform controlled rollback
|
|
3. Verify recovery procedures
|
|
4. Document any issues
|
|
5. Update procedures as needed
|
|
|
|
### Test Checklist
|
|
- [ ] Backup procedures work
|
|
- [ ] Rollback completes in target time
|
|
- [ ] Data integrity maintained
|
|
- [ ] Services restart properly
|
|
- [ ] Monitoring alerts fire
|
|
- [ ] Documentation is current
|
|
|
|
---
|
|
|
|
*Last Updated: 2024-12-18*
|
|
*Version: 1.0*
|
|
*Next Review: 2025-01-18* |