- Update base_scraper.py convert_to_markdown() to properly clean HTML - Remove script/style blocks and their content before conversion - Strip inline JavaScript event handlers - Clean up br tags and excessive blank lines - Fix malformed comparison operators that look like tags - Add comprehensive HTML cleaning during content extraction (not after) - Test confirms WordPress content now generates clean markdown without HTML This ensures all future WordPress scraping produces specification-compliant markdown without any HTML/XML contamination.
		
			
				
	
	
		
			10 lines
		
	
	
	
		
			499 B
		
	
	
	
		
			Text
		
	
	
	
	
	
			
		
		
	
	
			10 lines
		
	
	
	
		
			499 B
		
	
	
	
		
			Text
		
	
	
	
	
	
| # Netscape HTTP Cookie File
 | |
| # This file is generated by yt-dlp.  Do not edit.
 | |
| 
 | |
| .youtube.com	TRUE	/	FALSE	0	PREF	hl=en&tz=UTC
 | |
| .youtube.com	TRUE	/	TRUE	0	SOCS	CAI
 | |
| .youtube.com	TRUE	/	TRUE	1755566913	GPS	1
 | |
| .youtube.com	TRUE	/	TRUE	0	YSC	43Nie4OEFSs
 | |
| .youtube.com	TRUE	/	TRUE	1771117113	__Secure-ROLLOUT_TOKEN	CIP-nLirnrH_CBDY79nX1ZWPAxjY79nX1ZWPAw%3D%3D
 | |
| .youtube.com	TRUE	/	TRUE	1771117113	VISITOR_INFO1_LIVE	SdhPQiaSkwM
 | |
| .youtube.com	TRUE	/	TRUE	1771117113	VISITOR_PRIVACY_METADATA	CgJDQRIEGgAgWw%3D%3D
 |