hvac-kia-content/docs/PRODUCTION_TODO.md
Ben Reed 7e5377e7b1 docs: Update all documentation to use hkia naming convention
Documentation Updates:
- Updated project specification with hkia naming and paths
- Modified all markdown documentation files (12 files updated)
- Changed service names from hvac-content-* to hkia-content-*
- Updated NAS paths from /mnt/nas/hvacknowitall to /mnt/nas/hkia
- Replaced all instances of "HVAC Know It All" with "HKIA"

Files Updated:
- README.md - Updated service names and commands
- CLAUDE.md - Updated environment variables and paths
- DEPLOY.md - Updated deployment instructions
- docs/project_specification.md - Updated naming convention specs
- docs/status.md - Updated project status with new naming
- docs/final_status.md - Updated completion status
- docs/deployment_strategy.md - Updated deployment paths
- docs/DEPLOYMENT_CHECKLIST.md - Updated checklist items
- docs/PRODUCTION_TODO.md - Updated production tasks
- BACKLOG_STATUS.md - Updated backlog references
- UPDATED_CAPTURE_STATUS.md - Updated capture status
- FINAL_TALLY_REPORT.md - Updated tally report

Notes:
- Repository name remains hvacknowitall-content (unchanged)
- Project directory remains hvac-kia-content (unchanged)
- All user-facing outputs now use clean "hkia" naming

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-19 13:40:27 -03:00

8.6 KiB

Production Readiness Todo List

Overview

This document outlines all tasks required to meet the original specification and prepare the HKIA Content Aggregator for production deployment. Tasks are organized by priority and phase.

Note: Docker/Kubernetes deployment is not feasible due to TikTok scraping requiring display server access. The system uses systemd for service management instead.


Phase 1: Meet Original Specification

Priority: CRITICAL - Core functionality gaps Timeline: Week 1

Scheduling & Timing

  • Fix scheduling times to match spec (8 AM & 12 PM ADT instead of 6 AM & 6 PM)
    • Update systemd timer files
    • Update production configuration
    • Test timer activation

Data Synchronization

  • Enable NAS sync in production runner
    • Add orchestrator.sync_to_nas() call
    • Verify NAS mount path
    • Test rsync functionality

File Organization

  • Fix file naming convention to match spec format

    • Change from: update_20241218_060000.md
    • To: hkia_<source>_2024-12-18-T060000.md
  • Create proper directory structure

    data/
    ├── markdown_current/
    ├── markdown_archives/
    │   ├── WordPress/
    │   ├── Instagram/
    │   ├── YouTube/
    │   ├── Podcast/
    │   └── MailChimp/
    ├── media/
    │   ├── WordPress/
    │   ├── Instagram/
    │   ├── YouTube/
    │   ├── Podcast/
    │   └── MailChimp/
    └── .state/
    

Content Processing

  • Implement media downloading for all sources

    • YouTube thumbnails and videos (optional)
    • Instagram images and videos
    • WordPress featured images
    • Podcast episode artwork
  • Standardize markdown output format to specification

    # ID: [unique_identifier]
    ## Title: [content_title]
    ## Type: [content_type]
    ## Permalink: [url]
    ## Description:
    [content_description]
    ## Metadata:
    ### Comments: [count]
    ### Likes: [count]
    ### Tags:
    - tag1
    - tag2
    
  • Add MarkItDown package for proper markdown conversion

    • Install markitdown
    • Replace custom formatting logic
    • Test output quality

Security Enhancements

  • Implement user agent rotation for web scrapers
    • Create user agent pool
    • Rotate on each request
    • Add to Instagram and TikTok scrapers

Phase 2: Testing Suite

Priority: HIGH - Required by specification Timeline: Week 1-2

Unit Testing

  • Create pytest unit tests with mocking
    • Test each scraper independently
    • Mock external API calls
    • Test state management
    • Test markdown conversion
    • Test error handling

Integration Testing

  • Create integration tests for parallel processing
    • Test ThreadPoolExecutor functionality
    • Test file archiving
    • Test rsync functionality
    • Test scheduling logic

End-to-End Testing

  • Create end-to-end tests with mock data
    • Full workflow simulation
    • Verify markdown output format
    • Verify file naming and placement
    • Test incremental updates

Phase 3: Fix Critical Production Issues

Priority: CRITICAL - Security & reliability Timeline: Week 2

Systemd Service Fixes

  • Fix hardcoded paths in systemd services

    • Replace User=ben with configurable user
    • Replace /home/ben/dev/hvac-kia-content with /opt/hvac-kia-content
    • Use environment variables or templating
  • Remove hardcoded DISPLAY/XAUTHORITY from systemd services

    • Move to separate environment file
    • Only load for TikTok-specific service
    • Document display server requirements

Startup Validation

  • Add environment variable validation on startup
    def validate_environment():
        required = [
            'WORDPRESS_USERNAME', 'WORDPRESS_API_KEY',
            'YOUTUBE_CHANNEL_URL', 'INSTAGRAM_USERNAME',
            'INSTAGRAM_PASSWORD'
        ]
        missing = [k for k in required if not os.getenv(k)]
        if missing:
            raise ValueError(f"Missing required env vars: {missing}")
    

Error Handling & Recovery

  • Implement retry logic using configured RETRY_CONFIG

    • Add tenacity library
    • Wrap network calls with retry decorator
    • Use exponential backoff settings
  • Add HTTP connection pooling with requests.Session

    • Create session in base_scraper.init
    • Reuse session across requests
    • Configure connection pool size
  • Fix error isolation (don't crash orchestrator on single failure)

    • Continue processing other scrapers
    • Collect all errors for reporting
    • Return partial results

Phase 4: Production Hardening

Priority: HIGH - Operations & monitoring Timeline: Week 2-3

Monitoring & Alerting

  • Implement health check monitoring and alerting
    • Send ping to healthcheck URL on success
    • Email alerts on critical failures
    • Track metrics (items processed, errors, duration)

Logging Improvements

  • Add log rotation with RotatingFileHandler
    • Configure max file size (10MB)
    • Keep 5 backup files
    • Implement for each source

Input Validation

  • Add input validation for configuration values
    • Validate numeric values are positive
    • Check rate limits are reasonable
    • Verify paths exist and are writable

Phase 5: Documentation & Deployment

Priority: MEDIUM - Final preparation Timeline: Week 3

Documentation

  • Document why systemd was chosen over k8s

    • TikTok requires display server access
    • Browser automation incompatible with containers
    • Add to README and architecture docs
  • Create production deployment checklist

    • Pre-deployment verification steps
    • Configuration validation
    • Rollback procedures
  • Create rollback procedures and documentation

    • Backup current version
    • Database/state rollback steps
    • Service restoration process

Testing & Monitoring

  • Test full production deployment on staging environment

    • Clone production config
    • Run for 24 hours
    • Verify all sources working
  • Set up monitoring dashboards and alerts

    • Grafana dashboard for metrics
    • Alert rules for failures
    • Disk usage monitoring

Implementation Priority

🔴 Critical (Do First)

  1. Fix hardcoded paths in systemd services
  2. Add environment variable validation
  3. Enable NAS sync
  4. Fix error isolation
  5. Fix scheduling times

🟠 High Priority (Do Second)

  1. Implement retry logic
  2. Add connection pooling
  3. Create pytest unit tests
  4. Implement health monitoring
  5. Add log rotation

🟡 Medium Priority (Do Third)

  1. Fix file naming convention
  2. Create proper directory structure
  3. Standardize markdown format
  4. Implement media downloading
  5. Add MarkItDown package

🟢 Nice to Have (If Time Permits)

  1. User agent rotation
  2. Integration tests
  3. End-to-end tests
  4. Monitoring dashboards
  5. Comprehensive documentation

Success Criteria

Minimum Viable Production

  • All scrapers functional
  • Incremental updates working
  • NAS sync enabled
  • Proper error handling
  • Systemd services portable
  • Environment validation
  • Basic monitoring

Full Production Ready

  • All specification requirements met
  • Comprehensive test suite
  • Full monitoring and alerting
  • Complete documentation
  • Rollback procedures
  • 99% uptime capability

Notes

Why Not Docker/Kubernetes?

TikTok scraping requires a display server (X11/Wayland) for browser automation with Scrapling. This makes containerization impractical as containers don't have native display server access. Systemd provides adequate service management for this use case.

Current Gaps from Specification

  1. Scheduling: Currently 6 AM/6 PM, spec requires 8 AM/12 PM
  2. NAS Sync: Implemented but not activated
  3. Media Downloads: Not implemented
  4. File Naming: Simplified format used
  5. Directory Structure: Flat structure instead of source-separated
  6. Testing: Manual tests only, no pytest suite
  7. Markdown Format: Custom format instead of specified structure

Estimated Timeline

  • Week 1: Critical fixes and spec compliance
  • Week 2: Testing and error handling
  • Week 3: Monitoring and documentation
  • Total: 3 weeks to full production readiness

Quick Start Commands

# Phase 1: Critical Security Fixes
sed -i 's/User=ben/User=${SERVICE_USER}/g' systemd/*.service
sed -i 's|/home/ben/dev|/opt|g' systemd/*.service

# Phase 2: Enable NAS Sync
echo "orchestrator.sync_to_nas()" >> run_production.py

# Phase 3: Fix Scheduling
sed -i 's/06:00:00/08:00:00/g' systemd/*.timer
sed -i 's/18:00:00/12:00:00/g' systemd/*.timer

# Phase 4: Test Deployment
./install_production.sh
systemctl status hkia-content-aggregator.timer

Last Updated: 2024-12-18 Version: 1.0