- Add optional individual video page fetching for complete captions
- Implement profile scrolling to discover more videos (27+ vs 18)
- Add configurable rate limiting and anti-detection delays
- Fix RSS scrapers to support max_items parameter for backlog fetching
- Add fetch_captions parameter with max_caption_fetches limit
- Include additional metadata extraction (likes, comments, shares, duration)
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Implement Instagram scraper with aggressive rate limiting
- Add orchestrator for running all scrapers in parallel
- Create comprehensive tests for Instagram scraper (11 tests)
- Create tests for orchestrator (9 tests)
- Fix Instagram test issues with post type detection
- All 60 tests passing successfully
🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com>
- YouTube channel scraper using yt-dlp
- Authentication and session persistence via cookies
- Humanized delays and rate limiting (2-5 seconds between requests)
- User agent rotation for stealth
- Incremental updates via state management
- Support for videos, shorts, and live streams detection
- All 11 tests passing
🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com>
- Created base RSS scraper class with common functionality
- Implemented MailChimp RSS scraper for newsletters
- Implemented Podcast RSS scraper with audio/image extraction
- State management for incremental updates
- All 9 tests passing for RSS scrapers
🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com>
- Created WordPressScraper class extending BaseScraper
- Fetches posts with pagination support
- Enriches posts with author, category, and tag information
- Implements incremental updates via state management
- Word count calculation for content
- All 11 tests passing
🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com>
- Set up UV environment with all required packages
- Created comprehensive project structure
- Implemented abstract BaseScraper class with TDD
- Added documentation (project spec, implementation plan, status)
- Configured .env for credentials (not committed)
- All base scraper tests passing (9/9)
🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com>