HKIA Content Aggregation System - Complete content scraping and markdown generation for 5 sources (WordPress, MailChimp RSS, Podcast RSS, YouTube, Instagram)
Find a file
Ben Reed 1e5880bf00 feat: Enhance TikTok scraper with caption fetching and improved video discovery
- Add optional individual video page fetching for complete captions
- Implement profile scrolling to discover more videos (27+ vs 18)
- Add configurable rate limiting and anti-detection delays
- Fix RSS scrapers to support max_items parameter for backlog fetching
- Add fetch_captions parameter with max_caption_fetches limit
- Include additional metadata extraction (likes, comments, shares, duration)

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-18 18:59:46 -03:00
docs Initial commit: Project foundation with base scraper and tests 2025-08-18 12:15:17 -03:00
src feat: Enhance TikTok scraper with caption fetching and improved video discovery 2025-08-18 18:59:46 -03:00
test_data Add Instagram scraper with instaloader and parallel processing orchestrator 2025-08-18 12:56:57 -03:00
tests Add Instagram scraper with instaloader and parallel processing orchestrator 2025-08-18 12:56:57 -03:00
.gitignore Initial commit: Project foundation with base scraper and tests 2025-08-18 12:15:17 -03:00
.python-version Initial commit: Project foundation with base scraper and tests 2025-08-18 12:15:17 -03:00
claude.md Initial commit: Project foundation with base scraper and tests 2025-08-18 12:15:17 -03:00
main.py Initial commit: Project foundation with base scraper and tests 2025-08-18 12:15:17 -03:00
pyproject.toml Initial commit: Project foundation with base scraper and tests 2025-08-18 12:15:17 -03:00
status.md Initial commit: Project foundation with base scraper and tests 2025-08-18 12:15:17 -03:00
test_real_data.py feat: Enhance TikTok scraper with caption fetching and improved video discovery 2025-08-18 18:59:46 -03:00
uv.lock Initial commit: Project foundation with base scraper and tests 2025-08-18 12:15:17 -03:00