hvac-kia-content/docs/PRODUCTION_TODO.md
Ben Reed 05218a873b Fix critical production issues and improve spec compliance
Production Readiness Improvements:
- Fixed scheduling to match spec (8 AM & 12 PM ADT instead of 6 AM/6 PM)
- Enabled NAS synchronization in production runner with error handling
- Fixed file naming convention to spec format (hvacknowitall_combined_YYYY-MM-DD-THHMMSS.md)
- Made systemd services portable (removed hardcoded user/paths)
- Added environment variable validation on startup
- Moved DISPLAY/XAUTHORITY to .env configuration

Systemd Improvements:
- Created template service file (@.service) for any user
- Changed all paths to /opt/hvac-kia-content
- Updated installation script for portable deployment
- Fixed service dependencies and resource limits

Documentation:
- Created comprehensive PRODUCTION_TODO.md with 25 tasks
- Added PRODUCTION_GUIDE.md with deployment instructions
- Documented spec compliance gaps (65% complete)

Remaining work includes retry logic, connection pooling, media downloads,
and pytest test suite as documented in PRODUCTION_TODO.md

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-18 20:07:55 -03:00

315 lines
No EOL
8.6 KiB
Markdown

# Production Readiness Todo List
## Overview
This document outlines all tasks required to meet the original specification and prepare the HVAC Know It All Content Aggregator for production deployment. Tasks are organized by priority and phase.
**Note:** Docker/Kubernetes deployment is not feasible due to TikTok scraping requiring display server access. The system uses systemd for service management instead.
---
## Phase 1: Meet Original Specification
**Priority: CRITICAL - Core functionality gaps**
**Timeline: Week 1**
### Scheduling & Timing
- [ ] Fix scheduling times to match spec (8 AM & 12 PM ADT instead of 6 AM & 6 PM)
- Update systemd timer files
- Update production configuration
- Test timer activation
### Data Synchronization
- [ ] Enable NAS sync in production runner
- Add `orchestrator.sync_to_nas()` call
- Verify NAS mount path
- Test rsync functionality
### File Organization
- [ ] Fix file naming convention to match spec format
- Change from: `update_20241218_060000.md`
- To: `hvacknowitall_<source>_2024-12-18-T060000.md`
- [ ] Create proper directory structure
```
data/
├── markdown_current/
├── markdown_archives/
│ ├── WordPress/
│ ├── Instagram/
│ ├── YouTube/
│ ├── Podcast/
│ └── MailChimp/
├── media/
│ ├── WordPress/
│ ├── Instagram/
│ ├── YouTube/
│ ├── Podcast/
│ └── MailChimp/
└── .state/
```
### Content Processing
- [ ] Implement media downloading for all sources
- YouTube thumbnails and videos (optional)
- Instagram images and videos
- WordPress featured images
- Podcast episode artwork
- [ ] Standardize markdown output format to specification
```markdown
# ID: [unique_identifier]
## Title: [content_title]
## Type: [content_type]
## Permalink: [url]
## Description:
[content_description]
## Metadata:
### Comments: [count]
### Likes: [count]
### Tags:
- tag1
- tag2
```
- [ ] Add MarkItDown package for proper markdown conversion
- Install markitdown
- Replace custom formatting logic
- Test output quality
### Security Enhancements
- [ ] Implement user agent rotation for web scrapers
- Create user agent pool
- Rotate on each request
- Add to Instagram and TikTok scrapers
---
## Phase 2: Testing Suite
**Priority: HIGH - Required by specification**
**Timeline: Week 1-2**
### Unit Testing
- [ ] Create pytest unit tests with mocking
- Test each scraper independently
- Mock external API calls
- Test state management
- Test markdown conversion
- Test error handling
### Integration Testing
- [ ] Create integration tests for parallel processing
- Test ThreadPoolExecutor functionality
- Test file archiving
- Test rsync functionality
- Test scheduling logic
### End-to-End Testing
- [ ] Create end-to-end tests with mock data
- Full workflow simulation
- Verify markdown output format
- Verify file naming and placement
- Test incremental updates
---
## Phase 3: Fix Critical Production Issues
**Priority: CRITICAL - Security & reliability**
**Timeline: Week 2**
### Systemd Service Fixes
- [ ] Fix hardcoded paths in systemd services
- Replace `User=ben` with configurable user
- Replace `/home/ben/dev/hvac-kia-content` with `/opt/hvac-kia-content`
- Use environment variables or templating
- [ ] Remove hardcoded DISPLAY/XAUTHORITY from systemd services
- Move to separate environment file
- Only load for TikTok-specific service
- Document display server requirements
### Startup Validation
- [ ] Add environment variable validation on startup
```python
def validate_environment():
required = [
'WORDPRESS_USERNAME', 'WORDPRESS_API_KEY',
'YOUTUBE_CHANNEL_URL', 'INSTAGRAM_USERNAME',
'INSTAGRAM_PASSWORD'
]
missing = [k for k in required if not os.getenv(k)]
if missing:
raise ValueError(f"Missing required env vars: {missing}")
```
### Error Handling & Recovery
- [ ] Implement retry logic using configured RETRY_CONFIG
- Add tenacity library
- Wrap network calls with retry decorator
- Use exponential backoff settings
- [ ] Add HTTP connection pooling with requests.Session
- Create session in base_scraper.__init__
- Reuse session across requests
- Configure connection pool size
- [ ] Fix error isolation (don't crash orchestrator on single failure)
- Continue processing other scrapers
- Collect all errors for reporting
- Return partial results
---
## Phase 4: Production Hardening
**Priority: HIGH - Operations & monitoring**
**Timeline: Week 2-3**
### Monitoring & Alerting
- [ ] Implement health check monitoring and alerting
- Send ping to healthcheck URL on success
- Email alerts on critical failures
- Track metrics (items processed, errors, duration)
### Logging Improvements
- [ ] Add log rotation with RotatingFileHandler
- Configure max file size (10MB)
- Keep 5 backup files
- Implement for each source
### Input Validation
- [ ] Add input validation for configuration values
- Validate numeric values are positive
- Check rate limits are reasonable
- Verify paths exist and are writable
---
## Phase 5: Documentation & Deployment
**Priority: MEDIUM - Final preparation**
**Timeline: Week 3**
### Documentation
- [ ] Document why systemd was chosen over k8s
- TikTok requires display server access
- Browser automation incompatible with containers
- Add to README and architecture docs
- [ ] Create production deployment checklist
- Pre-deployment verification steps
- Configuration validation
- Rollback procedures
- [ ] Create rollback procedures and documentation
- Backup current version
- Database/state rollback steps
- Service restoration process
### Testing & Monitoring
- [ ] Test full production deployment on staging environment
- Clone production config
- Run for 24 hours
- Verify all sources working
- [ ] Set up monitoring dashboards and alerts
- Grafana dashboard for metrics
- Alert rules for failures
- Disk usage monitoring
---
## Implementation Priority
### 🔴 Critical (Do First)
1. Fix hardcoded paths in systemd services
2. Add environment variable validation
3. Enable NAS sync
4. Fix error isolation
5. Fix scheduling times
### 🟠 High Priority (Do Second)
6. Implement retry logic
7. Add connection pooling
8. Create pytest unit tests
9. Implement health monitoring
10. Add log rotation
### 🟡 Medium Priority (Do Third)
11. Fix file naming convention
12. Create proper directory structure
13. Standardize markdown format
14. Implement media downloading
15. Add MarkItDown package
### 🟢 Nice to Have (If Time Permits)
16. User agent rotation
17. Integration tests
18. End-to-end tests
19. Monitoring dashboards
20. Comprehensive documentation
---
## Success Criteria
### Minimum Viable Production
- [x] All scrapers functional
- [x] Incremental updates working
- [ ] NAS sync enabled
- [ ] Proper error handling
- [ ] Systemd services portable
- [ ] Environment validation
- [ ] Basic monitoring
### Full Production Ready
- [ ] All specification requirements met
- [ ] Comprehensive test suite
- [ ] Full monitoring and alerting
- [ ] Complete documentation
- [ ] Rollback procedures
- [ ] 99% uptime capability
---
## Notes
### Why Not Docker/Kubernetes?
TikTok scraping requires a display server (X11/Wayland) for browser automation with Scrapling. This makes containerization impractical as containers don't have native display server access. Systemd provides adequate service management for this use case.
### Current Gaps from Specification
1. **Scheduling**: Currently 6 AM/6 PM, spec requires 8 AM/12 PM
2. **NAS Sync**: Implemented but not activated
3. **Media Downloads**: Not implemented
4. **File Naming**: Simplified format used
5. **Directory Structure**: Flat structure instead of source-separated
6. **Testing**: Manual tests only, no pytest suite
7. **Markdown Format**: Custom format instead of specified structure
### Estimated Timeline
- **Week 1**: Critical fixes and spec compliance
- **Week 2**: Testing and error handling
- **Week 3**: Monitoring and documentation
- **Total**: 3 weeks to full production readiness
---
## Quick Start Commands
```bash
# Phase 1: Critical Security Fixes
sed -i 's/User=ben/User=${SERVICE_USER}/g' systemd/*.service
sed -i 's|/home/ben/dev|/opt|g' systemd/*.service
# Phase 2: Enable NAS Sync
echo "orchestrator.sync_to_nas()" >> run_production.py
# Phase 3: Fix Scheduling
sed -i 's/06:00:00/08:00:00/g' systemd/*.timer
sed -i 's/18:00:00/12:00:00/g' systemd/*.timer
# Phase 4: Test Deployment
./install_production.sh
systemctl status hvac-content-aggregator.timer
```
---
*Last Updated: 2024-12-18*
*Version: 1.0*