hvac-kia-content/docs/PRODUCTION_TODO.md
Ben Reed 7e5377e7b1 docs: Update all documentation to use hkia naming convention
Documentation Updates:
- Updated project specification with hkia naming and paths
- Modified all markdown documentation files (12 files updated)
- Changed service names from hvac-content-* to hkia-content-*
- Updated NAS paths from /mnt/nas/hvacknowitall to /mnt/nas/hkia
- Replaced all instances of "HVAC Know It All" with "HKIA"

Files Updated:
- README.md - Updated service names and commands
- CLAUDE.md - Updated environment variables and paths
- DEPLOY.md - Updated deployment instructions
- docs/project_specification.md - Updated naming convention specs
- docs/status.md - Updated project status with new naming
- docs/final_status.md - Updated completion status
- docs/deployment_strategy.md - Updated deployment paths
- docs/DEPLOYMENT_CHECKLIST.md - Updated checklist items
- docs/PRODUCTION_TODO.md - Updated production tasks
- BACKLOG_STATUS.md - Updated backlog references
- UPDATED_CAPTURE_STATUS.md - Updated capture status
- FINAL_TALLY_REPORT.md - Updated tally report

Notes:
- Repository name remains hvacknowitall-content (unchanged)
- Project directory remains hvac-kia-content (unchanged)
- All user-facing outputs now use clean "hkia" naming

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-19 13:40:27 -03:00

315 lines
No EOL
8.6 KiB
Markdown

# Production Readiness Todo List
## Overview
This document outlines all tasks required to meet the original specification and prepare the HKIA Content Aggregator for production deployment. Tasks are organized by priority and phase.
**Note:** Docker/Kubernetes deployment is not feasible due to TikTok scraping requiring display server access. The system uses systemd for service management instead.
---
## Phase 1: Meet Original Specification
**Priority: CRITICAL - Core functionality gaps**
**Timeline: Week 1**
### Scheduling & Timing
- [ ] Fix scheduling times to match spec (8 AM & 12 PM ADT instead of 6 AM & 6 PM)
- Update systemd timer files
- Update production configuration
- Test timer activation
### Data Synchronization
- [ ] Enable NAS sync in production runner
- Add `orchestrator.sync_to_nas()` call
- Verify NAS mount path
- Test rsync functionality
### File Organization
- [ ] Fix file naming convention to match spec format
- Change from: `update_20241218_060000.md`
- To: `hkia_<source>_2024-12-18-T060000.md`
- [ ] Create proper directory structure
```
data/
├── markdown_current/
├── markdown_archives/
│ ├── WordPress/
│ ├── Instagram/
│ ├── YouTube/
│ ├── Podcast/
│ └── MailChimp/
├── media/
│ ├── WordPress/
│ ├── Instagram/
│ ├── YouTube/
│ ├── Podcast/
│ └── MailChimp/
└── .state/
```
### Content Processing
- [ ] Implement media downloading for all sources
- YouTube thumbnails and videos (optional)
- Instagram images and videos
- WordPress featured images
- Podcast episode artwork
- [ ] Standardize markdown output format to specification
```markdown
# ID: [unique_identifier]
## Title: [content_title]
## Type: [content_type]
## Permalink: [url]
## Description:
[content_description]
## Metadata:
### Comments: [count]
### Likes: [count]
### Tags:
- tag1
- tag2
```
- [ ] Add MarkItDown package for proper markdown conversion
- Install markitdown
- Replace custom formatting logic
- Test output quality
### Security Enhancements
- [ ] Implement user agent rotation for web scrapers
- Create user agent pool
- Rotate on each request
- Add to Instagram and TikTok scrapers
---
## Phase 2: Testing Suite
**Priority: HIGH - Required by specification**
**Timeline: Week 1-2**
### Unit Testing
- [ ] Create pytest unit tests with mocking
- Test each scraper independently
- Mock external API calls
- Test state management
- Test markdown conversion
- Test error handling
### Integration Testing
- [ ] Create integration tests for parallel processing
- Test ThreadPoolExecutor functionality
- Test file archiving
- Test rsync functionality
- Test scheduling logic
### End-to-End Testing
- [ ] Create end-to-end tests with mock data
- Full workflow simulation
- Verify markdown output format
- Verify file naming and placement
- Test incremental updates
---
## Phase 3: Fix Critical Production Issues
**Priority: CRITICAL - Security & reliability**
**Timeline: Week 2**
### Systemd Service Fixes
- [ ] Fix hardcoded paths in systemd services
- Replace `User=ben` with configurable user
- Replace `/home/ben/dev/hvac-kia-content` with `/opt/hvac-kia-content`
- Use environment variables or templating
- [ ] Remove hardcoded DISPLAY/XAUTHORITY from systemd services
- Move to separate environment file
- Only load for TikTok-specific service
- Document display server requirements
### Startup Validation
- [ ] Add environment variable validation on startup
```python
def validate_environment():
required = [
'WORDPRESS_USERNAME', 'WORDPRESS_API_KEY',
'YOUTUBE_CHANNEL_URL', 'INSTAGRAM_USERNAME',
'INSTAGRAM_PASSWORD'
]
missing = [k for k in required if not os.getenv(k)]
if missing:
raise ValueError(f"Missing required env vars: {missing}")
```
### Error Handling & Recovery
- [ ] Implement retry logic using configured RETRY_CONFIG
- Add tenacity library
- Wrap network calls with retry decorator
- Use exponential backoff settings
- [ ] Add HTTP connection pooling with requests.Session
- Create session in base_scraper.__init__
- Reuse session across requests
- Configure connection pool size
- [ ] Fix error isolation (don't crash orchestrator on single failure)
- Continue processing other scrapers
- Collect all errors for reporting
- Return partial results
---
## Phase 4: Production Hardening
**Priority: HIGH - Operations & monitoring**
**Timeline: Week 2-3**
### Monitoring & Alerting
- [ ] Implement health check monitoring and alerting
- Send ping to healthcheck URL on success
- Email alerts on critical failures
- Track metrics (items processed, errors, duration)
### Logging Improvements
- [ ] Add log rotation with RotatingFileHandler
- Configure max file size (10MB)
- Keep 5 backup files
- Implement for each source
### Input Validation
- [ ] Add input validation for configuration values
- Validate numeric values are positive
- Check rate limits are reasonable
- Verify paths exist and are writable
---
## Phase 5: Documentation & Deployment
**Priority: MEDIUM - Final preparation**
**Timeline: Week 3**
### Documentation
- [ ] Document why systemd was chosen over k8s
- TikTok requires display server access
- Browser automation incompatible with containers
- Add to README and architecture docs
- [ ] Create production deployment checklist
- Pre-deployment verification steps
- Configuration validation
- Rollback procedures
- [ ] Create rollback procedures and documentation
- Backup current version
- Database/state rollback steps
- Service restoration process
### Testing & Monitoring
- [ ] Test full production deployment on staging environment
- Clone production config
- Run for 24 hours
- Verify all sources working
- [ ] Set up monitoring dashboards and alerts
- Grafana dashboard for metrics
- Alert rules for failures
- Disk usage monitoring
---
## Implementation Priority
### 🔴 Critical (Do First)
1. Fix hardcoded paths in systemd services
2. Add environment variable validation
3. Enable NAS sync
4. Fix error isolation
5. Fix scheduling times
### 🟠 High Priority (Do Second)
6. Implement retry logic
7. Add connection pooling
8. Create pytest unit tests
9. Implement health monitoring
10. Add log rotation
### 🟡 Medium Priority (Do Third)
11. Fix file naming convention
12. Create proper directory structure
13. Standardize markdown format
14. Implement media downloading
15. Add MarkItDown package
### 🟢 Nice to Have (If Time Permits)
16. User agent rotation
17. Integration tests
18. End-to-end tests
19. Monitoring dashboards
20. Comprehensive documentation
---
## Success Criteria
### Minimum Viable Production
- [x] All scrapers functional
- [x] Incremental updates working
- [ ] NAS sync enabled
- [ ] Proper error handling
- [ ] Systemd services portable
- [ ] Environment validation
- [ ] Basic monitoring
### Full Production Ready
- [ ] All specification requirements met
- [ ] Comprehensive test suite
- [ ] Full monitoring and alerting
- [ ] Complete documentation
- [ ] Rollback procedures
- [ ] 99% uptime capability
---
## Notes
### Why Not Docker/Kubernetes?
TikTok scraping requires a display server (X11/Wayland) for browser automation with Scrapling. This makes containerization impractical as containers don't have native display server access. Systemd provides adequate service management for this use case.
### Current Gaps from Specification
1. **Scheduling**: Currently 6 AM/6 PM, spec requires 8 AM/12 PM
2. **NAS Sync**: Implemented but not activated
3. **Media Downloads**: Not implemented
4. **File Naming**: Simplified format used
5. **Directory Structure**: Flat structure instead of source-separated
6. **Testing**: Manual tests only, no pytest suite
7. **Markdown Format**: Custom format instead of specified structure
### Estimated Timeline
- **Week 1**: Critical fixes and spec compliance
- **Week 2**: Testing and error handling
- **Week 3**: Monitoring and documentation
- **Total**: 3 weeks to full production readiness
---
## Quick Start Commands
```bash
# Phase 1: Critical Security Fixes
sed -i 's/User=ben/User=${SERVICE_USER}/g' systemd/*.service
sed -i 's|/home/ben/dev|/opt|g' systemd/*.service
# Phase 2: Enable NAS Sync
echo "orchestrator.sync_to_nas()" >> run_production.py
# Phase 3: Fix Scheduling
sed -i 's/06:00:00/08:00:00/g' systemd/*.timer
sed -i 's/18:00:00/12:00:00/g' systemd/*.timer
# Phase 4: Test Deployment
./install_production.sh
systemctl status hkia-content-aggregator.timer
```
---
*Last Updated: 2024-12-18*
*Version: 1.0*