- Update base_scraper.py convert_to_markdown() to properly clean HTML - Remove script/style blocks and their content before conversion - Strip inline JavaScript event handlers - Clean up br tags and excessive blank lines - Fix malformed comparison operators that look like tags - Add comprehensive HTML cleaning during content extraction (not after) - Test confirms WordPress content now generates clean markdown without HTML This ensures all future WordPress scraping produces specification-compliant markdown without any HTML/XML contamination.
		
			
				
	
	
		
			7 lines
		
	
	
		
			No EOL
		
	
	
		
			163 B
		
	
	
	
		
			JSON
		
	
	
	
	
	
			
		
		
	
	
			7 lines
		
	
	
		
			No EOL
		
	
	
		
			163 B
		
	
	
	
		
			JSON
		
	
	
	
	
	
| {
 | |
|   "last_update": "2025-08-18T22:14:30.231559",
 | |
|   "last_item_count": 139,
 | |
|   "backlog_captured": true,
 | |
|   "backlog_timestamp": "20250818_221430",
 | |
|   "last_id": 329
 | |
| } |