Major Changes:
- Updated all code references from hvacknowitall/hvacnkowitall to hkia
- Renamed all existing markdown files to use hkia_ prefix
- Updated configuration files, scrapers, and production scripts
- Modified systemd service descriptions to use HKIA
- Changed NAS sync path to /mnt/nas/hkia
Files Updated:
- 20+ source files updated with new naming convention
- 34 markdown files renamed to hkia_* format
- All ScraperConfig brand_name parameters now use 'hkia'
- Documentation updated to reflect new naming
Rationale:
- Shorter, cleaner filenames
- Consistent branding across all outputs
- Easier to type and reference
- Maintains same functionality with improved naming
Next Steps:
- Deploy updated services to production
- Update any external references to old naming
- Monitor scrapers to ensure proper operation
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Update base_scraper.py convert_to_markdown() to properly clean HTML
- Remove script/style blocks and their content before conversion
- Strip inline JavaScript event handlers
- Clean up br tags and excessive blank lines
- Fix malformed comparison operators that look like tags
- Add comprehensive HTML cleaning during content extraction (not after)
- Test confirms WordPress content now generates clean markdown without HTML
This ensures all future WordPress scraping produces specification-compliant
markdown without any HTML/XML contamination.