HKIA Content Aggregation System - Complete content scraping and markdown generation for 5 sources (WordPress, MailChimp RSS, Podcast RSS, YouTube, Instagram)
Find a file
Ben Reed 8a0b8b4d3f Update documentation with production deployment status
- Update status.md with current production deployment status
- Document completed backlogs (WordPress: 139, Podcast: 428, YouTube: 200)
- Track Instagram progress (50/1000 @ 200/hr) and TikTok queue status
- Create claude.md with implementation notes and key solutions
- Document HTML cleaning fix, rate limit optimization, and NAS sync
- Add testing commands and maintenance notes for future reference
- Include known issues and file structure documentation
2025-08-18 23:14:45 -03:00
config Fix critical production issues and improve spec compliance 2025-08-18 20:07:55 -03:00
data_production_backlog Fix HTML/XML contamination in WordPress markdown extraction 2025-08-18 23:11:08 -03:00
data_quick_test Fix HTML/XML contamination in WordPress markdown extraction 2025-08-18 23:11:08 -03:00
docs Update documentation with production deployment status 2025-08-18 23:14:45 -03:00
monitoring Add comprehensive monitoring and alerting system 2025-08-18 21:35:28 -03:00
src Fix HTML/XML contamination in WordPress markdown extraction 2025-08-18 23:11:08 -03:00
systemd Add comprehensive monitoring and alerting system 2025-08-18 21:35:28 -03:00
test_data Fix HTML/XML contamination in WordPress markdown extraction 2025-08-18 23:11:08 -03:00
tests Add comprehensive test infrastructure 2025-08-18 21:16:14 -03:00
.env.production Optimize Instagram scraper and increase capture targets to 1000 2025-08-18 22:59:11 -03:00
.gitignore Initial commit: Project foundation with base scraper and tests 2025-08-18 12:15:17 -03:00
.python-version Initial commit: Project foundation with base scraper and tests 2025-08-18 12:15:17 -03:00
automated_backlog_capture.py Optimize Instagram scraper and increase capture targets to 1000 2025-08-18 22:59:11 -03:00
BACKLOG_STATUS.md Fix HTML/XML contamination in WordPress markdown extraction 2025-08-18 23:11:08 -03:00
capture_tiktok_backlog.py Fix critical production issues and improve spec compliance 2025-08-18 20:07:55 -03:00
CLAUDE.md Fix critical production issues and improve spec compliance 2025-08-18 20:07:55 -03:00
claude.md Fix critical production issues and improve spec compliance 2025-08-18 20:07:55 -03:00
clean_markdown.py Fix HTML/XML contamination in WordPress markdown extraction 2025-08-18 23:11:08 -03:00
debug_wordpress.py Fix critical production issues and improve spec compliance 2025-08-18 20:07:55 -03:00
debug_wordpress_raw.py Fix critical production issues and improve spec compliance 2025-08-18 20:07:55 -03:00
debug_youtube_detailed.py Fix critical production issues and improve spec compliance 2025-08-18 20:07:55 -03:00
debug_youtube_videos.py Fix critical production issues and improve spec compliance 2025-08-18 20:07:55 -03:00
deploy_production.sh Optimize Instagram scraper and increase capture targets to 1000 2025-08-18 22:59:11 -03:00
detailed_monitor.py Fix critical production issues and improve spec compliance 2025-08-18 20:07:55 -03:00
FINAL_TALLY_REPORT.md Fix HTML/XML contamination in WordPress markdown extraction 2025-08-18 23:11:08 -03:00
install.sh Fix critical production issues and improve spec compliance 2025-08-18 20:07:55 -03:00
install_production.sh Fix critical production issues and improve spec compliance 2025-08-18 20:07:55 -03:00
main.py Initial commit: Project foundation with base scraper and tests 2025-08-18 12:15:17 -03:00
monitor_backlog.py Fix critical production issues and improve spec compliance 2025-08-18 20:07:55 -03:00
monitor_backlog_progress.sh Optimize Instagram scraper and increase capture targets to 1000 2025-08-18 22:59:11 -03:00
production_backlog_capture.py Optimize Instagram scraper and increase capture targets to 1000 2025-08-18 22:59:11 -03:00
pyproject.toml Add final dependencies for monitoring and testing 2025-08-18 21:49:43 -03:00
quick_backlog_test.py Fix HTML/XML contamination in WordPress markdown extraction 2025-08-18 23:11:08 -03:00
requirements.txt Implement retry logic, connection pooling, and production hardening 2025-08-18 20:16:02 -03:00
requirements_new.txt Fix critical production issues and improve spec compliance 2025-08-18 20:07:55 -03:00
resume_instagram_capture.py Optimize Instagram scraper and increase capture targets to 1000 2025-08-18 22:59:11 -03:00
run_production.py Implement retry logic, connection pooling, and production hardening 2025-08-18 20:16:02 -03:00
status.md Fix critical production issues and improve spec compliance 2025-08-18 20:07:55 -03:00
test_instagram_debug.py Fix critical production issues and improve spec compliance 2025-08-18 20:07:55 -03:00
test_instagram_fix.py Fix critical production issues and improve spec compliance 2025-08-18 20:07:55 -03:00
test_markitdown_fix.py Fix critical production issues and improve spec compliance 2025-08-18 20:07:55 -03:00
test_production_deployment.py Add comprehensive production documentation and testing 2025-08-18 20:20:52 -03:00
test_real_data.py feat: Enhance TikTok scraper with caption fetching and improved video discovery 2025-08-18 18:59:46 -03:00
test_sources_simple.py Fix critical production issues and improve spec compliance 2025-08-18 20:07:55 -03:00
test_tiktok_advanced.py Fix critical production issues and improve spec compliance 2025-08-18 20:07:55 -03:00
test_tiktok_scrapling.py Fix critical production issues and improve spec compliance 2025-08-18 20:07:55 -03:00
test_wordpress_clean.py Fix HTML/XML contamination in WordPress markdown extraction 2025-08-18 23:11:08 -03:00
UPDATED_CAPTURE_STATUS.md Optimize Instagram scraper and increase capture targets to 1000 2025-08-18 22:59:11 -03:00
uv.lock Add final dependencies for monitoring and testing 2025-08-18 21:49:43 -03:00
validate_production.sh Optimize Instagram scraper and increase capture targets to 1000 2025-08-18 22:59:11 -03:00