Commit graph

8 commits

Author SHA1 Message Date
Ben Reed
7e5377e7b1 docs: Update all documentation to use hkia naming convention
Documentation Updates:
- Updated project specification with hkia naming and paths
- Modified all markdown documentation files (12 files updated)
- Changed service names from hvac-content-* to hkia-content-*
- Updated NAS paths from /mnt/nas/hvacknowitall to /mnt/nas/hkia
- Replaced all instances of "HVAC Know It All" with "HKIA"

Files Updated:
- README.md - Updated service names and commands
- CLAUDE.md - Updated environment variables and paths
- DEPLOY.md - Updated deployment instructions
- docs/project_specification.md - Updated naming convention specs
- docs/status.md - Updated project status with new naming
- docs/final_status.md - Updated completion status
- docs/deployment_strategy.md - Updated deployment paths
- docs/DEPLOYMENT_CHECKLIST.md - Updated checklist items
- docs/PRODUCTION_TODO.md - Updated production tasks
- BACKLOG_STATUS.md - Updated backlog references
- UPDATED_CAPTURE_STATUS.md - Updated capture status
- FINAL_TALLY_REPORT.md - Updated tally report

Notes:
- Repository name remains hvacknowitall-content (unchanged)
- Project directory remains hvac-kia-content (unchanged)
- All user-facing outputs now use clean "hkia" naming

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-19 13:40:27 -03:00
Ben Reed
daab901e35 refactor: Update naming convention from hvacknowitall to hkia
Major Changes:
- Updated all code references from hvacknowitall/hvacnkowitall to hkia
- Renamed all existing markdown files to use hkia_ prefix
- Updated configuration files, scrapers, and production scripts
- Modified systemd service descriptions to use HKIA
- Changed NAS sync path to /mnt/nas/hkia

Files Updated:
- 20+ source files updated with new naming convention
- 34 markdown files renamed to hkia_* format
- All ScraperConfig brand_name parameters now use 'hkia'
- Documentation updated to reflect new naming

Rationale:
- Shorter, cleaner filenames
- Consistent branding across all outputs
- Easier to type and reference
- Maintains same functionality with improved naming

Next Steps:
- Deploy updated services to production
- Update any external references to old naming
- Monitor scrapers to ensure proper operation

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-19 13:35:23 -03:00
Ben Reed
2edc359b5e feat: Implement comprehensive image downloading and cumulative markdown system
Major Updates:
- Added image downloading for Instagram, YouTube, and Podcast scrapers
- Implemented cumulative markdown system for maintaining single source-of-truth files
- Deployed production services with automatic NAS sync for images
- Standardized file naming conventions per project specification

New Features:
- Instagram: Downloads all post images, carousel images, and video thumbnails
- YouTube: Downloads video thumbnails (highest quality available)
- Podcast: Downloads episode artwork/thumbnails
- Consistent image naming: {source}_{item_id}_{type}.{ext}
- Cumulative markdown updates to prevent file proliferation
- Automatic media sync to NAS at /mnt/nas/hvacknowitall/media/

Production Deployment:
- New systemd services: hvac-content-images-8am and hvac-content-images-12pm
- Runs twice daily at 8 AM and 12 PM Atlantic time
- Comprehensive rsync for both markdown and media files

File Structure Compliance:
- Renamed Instagram backlog to spec-compliant format
- Archived legacy directory structures
- Ensured all new files follow <brandName>_<source>_<dateTime>.md format

Testing:
- Successfully captured Instagram posts 1-1000 with images
- Launched next batch (posts 1001-2000) currently in progress
- Verified thumbnail downloads for YouTube and Podcast content

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-19 12:54:21 -03:00
Ben Reed
8ceb858026 Implement cumulative markdown system and API integrations
Major improvements:
- Add CumulativeMarkdownManager for intelligent content merging
- Implement YouTube Data API v3 integration with caption support
- Add MailChimp API integration with content cleaning
- Create single source-of-truth files that grow with updates
- Smart merging: updates existing entries with better data
- Properly combines backlog + incremental daily updates

Features:
- 179/444 YouTube videos now have captions (40.3%)
- MailChimp content cleaned of headers/footers
- All sources consolidated to single files
- Archive management with timestamped versions
- Test suite and documentation included

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-19 10:53:40 -03:00
Ben Reed
8a0b8b4d3f Update documentation with production deployment status
- Update status.md with current production deployment status
- Document completed backlogs (WordPress: 139, Podcast: 428, YouTube: 200)
- Track Instagram progress (50/1000 @ 200/hr) and TikTok queue status
- Create claude.md with implementation notes and key solutions
- Document HTML cleaning fix, rate limit optimization, and NAS sync
- Add testing commands and maintenance notes for future reference
- Include known issues and file structure documentation
2025-08-18 23:14:45 -03:00
Ben Reed
a80af693ba Add comprehensive production documentation and testing
Documentation Added:
- ARCHITECTURE_DECISIONS.md: Explains why systemd over k8s (TikTok display requirements)
- DEPLOYMENT_CHECKLIST.md: Step-by-step deployment procedures
- ROLLBACK_PROCEDURES.md: Emergency rollback and recovery procedures
- test_production_deployment.py: Automated deployment verification script

Key Documentation Highlights:
- Detailed explanation of containerization limitations with browser automation
- Complete deployment checklist with pre/post verification steps
- Rollback scenarios with recovery time objectives
- Emergency contact templates and backup procedures
- Automated test script for production readiness

17 of 25 tasks completed (68% done)
Remaining work focuses on spec compliance and testing

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-18 20:20:52 -03:00
Ben Reed
05218a873b Fix critical production issues and improve spec compliance
Production Readiness Improvements:
- Fixed scheduling to match spec (8 AM & 12 PM ADT instead of 6 AM/6 PM)
- Enabled NAS synchronization in production runner with error handling
- Fixed file naming convention to spec format (hvacknowitall_combined_YYYY-MM-DD-THHMMSS.md)
- Made systemd services portable (removed hardcoded user/paths)
- Added environment variable validation on startup
- Moved DISPLAY/XAUTHORITY to .env configuration

Systemd Improvements:
- Created template service file (@.service) for any user
- Changed all paths to /opt/hvac-kia-content
- Updated installation script for portable deployment
- Fixed service dependencies and resource limits

Documentation:
- Created comprehensive PRODUCTION_TODO.md with 25 tasks
- Added PRODUCTION_GUIDE.md with deployment instructions
- Documented spec compliance gaps (65% complete)

Remaining work includes retry logic, connection pooling, media downloads,
and pytest test suite as documented in PRODUCTION_TODO.md

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-18 20:07:55 -03:00
Ben Reed
f9a8e719a7 Initial commit: Project foundation with base scraper and tests
- Set up UV environment with all required packages
- Created comprehensive project structure
- Implemented abstract BaseScraper class with TDD
- Added documentation (project spec, implementation plan, status)
- Configured .env for credentials (not committed)
- All base scraper tests passing (9/9)

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-18 12:15:17 -03:00