- Update base_scraper.py convert_to_markdown() to properly clean HTML - Remove script/style blocks and their content before conversion - Strip inline JavaScript event handlers - Clean up br tags and excessive blank lines - Fix malformed comparison operators that look like tags - Add comprehensive HTML cleaning during content extraction (not after) - Test confirms WordPress content now generates clean markdown without HTML This ensures all future WordPress scraping produces specification-compliant markdown without any HTML/XML contamination.
10 lines
499 B
Text
10 lines
499 B
Text
# Netscape HTTP Cookie File
|
|
# This file is generated by yt-dlp. Do not edit.
|
|
|
|
.youtube.com TRUE / FALSE 0 PREF hl=en&tz=UTC
|
|
.youtube.com TRUE / TRUE 0 SOCS CAI
|
|
.youtube.com TRUE / TRUE 1755566913 GPS 1
|
|
.youtube.com TRUE / TRUE 0 YSC 43Nie4OEFSs
|
|
.youtube.com TRUE / TRUE 1771117113 __Secure-ROLLOUT_TOKEN CIP-nLirnrH_CBDY79nX1ZWPAxjY79nX1ZWPAw%3D%3D
|
|
.youtube.com TRUE / TRUE 1771117113 VISITOR_INFO1_LIVE SdhPQiaSkwM
|
|
.youtube.com TRUE / TRUE 1771117113 VISITOR_PRIVACY_METADATA CgJDQRIEGgAgWw%3D%3D
|