hvac-kia-content/data_production_backlog/.cookies/youtube_cookies.txt
Ben Reed 8b83185130 Fix HTML/XML contamination in WordPress markdown extraction
- Update base_scraper.py convert_to_markdown() to properly clean HTML
- Remove script/style blocks and their content before conversion
- Strip inline JavaScript event handlers
- Clean up br tags and excessive blank lines
- Fix malformed comparison operators that look like tags
- Add comprehensive HTML cleaning during content extraction (not after)
- Test confirms WordPress content now generates clean markdown without HTML

This ensures all future WordPress scraping produces specification-compliant
markdown without any HTML/XML contamination.
2025-08-18 23:11:08 -03:00

10 lines
499 B
Text

# Netscape HTTP Cookie File
# This file is generated by yt-dlp. Do not edit.
.youtube.com TRUE / FALSE 0 PREF hl=en&tz=UTC
.youtube.com TRUE / TRUE 0 SOCS CAI
.youtube.com TRUE / TRUE 1755567962 GPS 1
.youtube.com TRUE / TRUE 0 YSC 7cc8-LrPd_Q
.youtube.com TRUE / TRUE 1771118162 VISITOR_INFO1_LIVE za_nyLN37wM
.youtube.com TRUE / TRUE 1771118162 VISITOR_PRIVACY_METADATA CgJDQRIEGgAgNQ%3D%3D
.youtube.com TRUE / TRUE 1771118162 __Secure-ROLLOUT_TOKEN CM7Wy8jf2ozaPxDbhefL2ZWPAxjbhefL2ZWPAw%3D%3D