hvac-kia-content/src
Ben Reed ef66d3bbc5 CRITICAL FIX: MailChimp content cleaning bug causing missing newsletter body
Issue:
- MailChimp campaigns missing body content in markdown files
- Logic flaw in HTML-to-markdown conversion flow
- Double cleaning and incorrect empty content checks

Root Cause:
- Checked already-cleaned content instead of original for HTML fallback
- HTML content never converted when plain_text was empty
- Applied cleaning twice when HTML was converted

Fix:
- Check original plain_text before deciding HTML conversion
- Convert HTML first, then clean once (eliminate double cleaning)
- Preserve all legitimate newsletter body content
- Keep header/footer cleaning patterns (they are appropriate)

Impact:
- All newsletter content now preserved correctly
- Headers/footers still properly removed
- Next production run will capture complete content

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-19 11:19:32 -03:00
..
__init__.py Initial commit: Project foundation with base scraper and tests 2025-08-18 12:15:17 -03:00
base_scraper.py Fix HTML/XML contamination in WordPress markdown extraction 2025-08-18 23:11:08 -03:00
base_scraper_cumulative.py Implement cumulative markdown system and API integrations 2025-08-19 10:53:40 -03:00
cumulative_markdown_manager.py Implement cumulative markdown system and API integrations 2025-08-19 10:53:40 -03:00
instagram_scraper.py Optimize Instagram scraper and increase capture targets to 1000 2025-08-18 22:59:11 -03:00
mailchimp_api_scraper_v2.py CRITICAL FIX: MailChimp content cleaning bug causing missing newsletter body 2025-08-19 11:19:32 -03:00
mailchimp_archive_scraper.py Fix critical production issues and improve spec compliance 2025-08-18 20:07:55 -03:00
orchestrator.py Fix NAS sync to include media files instead of logs 2025-08-18 21:52:28 -03:00
rss_scraper.py feat: Enhance TikTok scraper with caption fetching and improved video discovery 2025-08-18 18:59:46 -03:00
tiktok_scraper.py Fix critical production issues and improve spec compliance 2025-08-18 20:07:55 -03:00
tiktok_scraper_advanced.py feat: Enhance TikTok scraper with caption fetching and improved video discovery 2025-08-18 18:59:46 -03:00
tiktok_scraper_scrapling.py Fix critical production issues and improve spec compliance 2025-08-18 20:07:55 -03:00
wordpress_scraper.py Implement retry logic, connection pooling, and production hardening 2025-08-18 20:16:02 -03:00
youtube_api_scraper_v2.py Implement cumulative markdown system and API integrations 2025-08-19 10:53:40 -03:00
youtube_scraper.py Fix critical production issues and improve spec compliance 2025-08-18 20:07:55 -03:00