hvac-kia-content/UPDATED_CAPTURE_STATUS.md
Ben Reed 0a795437a7 Optimize Instagram scraper and increase capture targets to 1000
- Increased Instagram rate limit from 100 to 200 posts/hour
- Reduced delays: 10-20s (was 15-30s), extended breaks 30-60s (was 60-120s)
- Extended break interval: every 10 requests (was 5)
- Updated capture targets: 1000 posts for Instagram, 1000 videos for TikTok
- Added production deployment and monitoring scripts
- Created environment configuration template

This provides ~40-50% speed improvement for Instagram scraping and
captures 5x more Instagram content and 3.3x more TikTok content.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-18 22:59:11 -03:00

72 lines
No EOL
2.6 KiB
Markdown

# HVAC Know It All - Updated Production Backlog Capture
## 🚀 Updated Configuration
**Started**: August 18, 2025 @ 10:54 PM ADT
### 📈 New Rate Limits & Targets
| Source | Previous Target | New Target | Rate Limit | Estimated Time |
|--------|-----------------|------------|------------|----------------|
| **Instagram** | 200 posts | **1000 posts** | 200/hour | ~5 hours |
| **TikTok** | 300 videos | **1000 videos** | Browser-based | ~2-3 hours |
### ⚡ Instagram Optimization Changes
- **Rate limit**: Increased from 100 to **200 posts/hour**
- **Delays**: Reduced from 15-30s to **10-20 seconds**
- **Extended breaks**: Every **10 requests** (was 5)
- **Break duration**: **30-60 seconds** (was 60-120s)
- **Speed improvement**: ~**40-50% faster**
### 🎯 TikTok Enhancements
- **Total videos**: 1000 (if available)
- **Videos with captions**: 100 (increased from 50)
- **Caption fetching**: Individual page visits for detailed content
## 📊 Already Completed Sources
| Source | Items Captured | File Size | Status |
|--------|---------------|-----------|---------|
| **WordPress** | 139 posts | 1.5 MB | ✅ Complete |
| **Podcast** | 428 episodes | 727 KB | ✅ Complete |
| **YouTube** | 200 videos | 107 KB | ✅ Complete |
## 🔄 Currently Processing
- **Instagram**: Fetching 1000 posts with optimized rate limiting
- **Next**: TikTok with 1000 videos target
## 📁 Output Location
```
/home/ben/dev/hvac-kia-content/data_production_backlog/markdown_current/
├── hvacknowitall_wordpress_backlog_[timestamp].md
├── hvacknowitall_podcast_backlog_[timestamp].md
├── hvacknowitall_youtube_backlog_[timestamp].md
├── hvacknowitall_instagram_backlog_[timestamp].md (pending)
└── hvacknowitall_tiktok_backlog_[timestamp].md (pending)
```
## 📈 Progress Monitoring
To monitor real-time progress:
```bash
# Watch Instagram progress
tail -f instagram_1000.log
# Check overall status
./monitor_backlog_progress.sh --live
```
## ⏱️ Time Estimates
- **Instagram**: ~5 hours for 1000 posts at 200/hour
- **TikTok**: ~2-3 hours for 1000 videos (depends on caption fetching)
- **Total remaining**: ~7-8 hours
## 🎯 Final Deliverables
- **~2,767 total items** (767 already + 2000 new)
- **Specification-compliant markdown** for all sources
- **Media files** downloaded and organized
- **NAS synchronization** upon completion
## 📝 Notes
The increased targets will provide a much more comprehensive historical dataset:
- Instagram: 5x more content than originally planned
- TikTok: 3.3x more content than originally planned
- This will capture a significant portion of the brand's social media history