Commit graph

2 commits

Author SHA1 Message Date
Ben Reed
7e5377e7b1 docs: Update all documentation to use hkia naming convention
Documentation Updates:
- Updated project specification with hkia naming and paths
- Modified all markdown documentation files (12 files updated)
- Changed service names from hvac-content-* to hkia-content-*
- Updated NAS paths from /mnt/nas/hvacknowitall to /mnt/nas/hkia
- Replaced all instances of "HVAC Know It All" with "HKIA"

Files Updated:
- README.md - Updated service names and commands
- CLAUDE.md - Updated environment variables and paths
- DEPLOY.md - Updated deployment instructions
- docs/project_specification.md - Updated naming convention specs
- docs/status.md - Updated project status with new naming
- docs/final_status.md - Updated completion status
- docs/deployment_strategy.md - Updated deployment paths
- docs/DEPLOYMENT_CHECKLIST.md - Updated checklist items
- docs/PRODUCTION_TODO.md - Updated production tasks
- BACKLOG_STATUS.md - Updated backlog references
- UPDATED_CAPTURE_STATUS.md - Updated capture status
- FINAL_TALLY_REPORT.md - Updated tally report

Notes:
- Repository name remains hvacknowitall-content (unchanged)
- Project directory remains hvac-kia-content (unchanged)
- All user-facing outputs now use clean "hkia" naming

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-19 13:40:27 -03:00
Ben Reed
8b83185130 Fix HTML/XML contamination in WordPress markdown extraction
- Update base_scraper.py convert_to_markdown() to properly clean HTML
- Remove script/style blocks and their content before conversion
- Strip inline JavaScript event handlers
- Clean up br tags and excessive blank lines
- Fix malformed comparison operators that look like tags
- Add comprehensive HTML cleaning during content extraction (not after)
- Test confirms WordPress content now generates clean markdown without HTML

This ensures all future WordPress scraping produces specification-compliant
markdown without any HTML/XML contamination.
2025-08-18 23:11:08 -03:00