# Social Media Competitive Intelligence Setup Guide This guide covers the setup for Phase 2 social media competitive intelligence featuring YouTube and Instagram competitor scrapers. ## Overview The Phase 2 implementation includes: ### ✅ YouTube Competitive Scrapers (4 channels) - **AC Service Tech** (@acservicetech) - **Refrigeration Mentor** (@RefrigerationMentor) - **Love2HVAC** (@Love2HVAC) - **HVAC TV** (@HVACTV) ### ✅ Instagram Competitive Scrapers (3 accounts) - **AC Service Tech** (@acservicetech) - **Love2HVAC** (@love2hvac) - **HVAC Learning Solutions** (@hvaclearningsolutions) ## Prerequisites ### Required Environment Variables Add these to your `.env` file: ```bash # Existing HKIA Environment Variables (keep these) INSTAGRAM_USERNAME=hkia1 INSTAGRAM_PASSWORD=I22W5YlbRl7x YOUTUBE_API_KEY=your_youtube_api_key_here TIMEZONE=America/Halifax # Competitive Intelligence (Optional but recommended) # Oxylabs proxy for anti-detection OXYLABS_USERNAME=your_oxylabs_username OXYLABS_PASSWORD=your_oxylabs_password OXYLABS_PROXY_ENDPOINT=pr.oxylabs.io OXYLABS_PROXY_PORT=7777 # Jina.ai for content extraction JINA_API_KEY=your_jina_api_key ``` ### API Keys and Credentials 1. **YouTube Data API v3** (Required) - Same key used for HKIA YouTube scraping - Quota: ~10,000 units per day (shared with HKIA) 2. **Instagram Credentials** (Required) - Uses same HKIA credentials for competitive scraping - Implements aggressive rate limiting for compliance 3. **Oxylabs Proxy** (Optional but recommended) - For anti-detection and IP rotation - Sign up at https://oxylabs.io - Helps avoid rate limiting and blocks 4. **Jina.ai Reader** (Optional) - For enhanced content extraction - Sign up at https://jina.ai - Provides AI-powered content parsing ## Installation ### 1. Install Dependencies All required dependencies are already in `requirements.txt`: ```bash # Install with UV (preferred) uv sync # Or with pip pip install -r requirements.txt ``` ### 2. Test Installation Run the test suite to verify everything is set up correctly: ```bash python test_social_media_competitive.py ``` This will test: - ✅ Orchestrator initialization - ✅ Scraper configuration - ✅ API connectivity - ✅ Directory structure - ✅ Content discovery (if API keys available) ## Usage ### Quick Start Commands ```bash # List all available competitors python run_competitive_intelligence.py --operation list-competitors # Test setup python run_competitive_intelligence.py --operation test # Get social media status python run_competitive_intelligence.py --operation social-media-status ``` ### Social Media Operations ```bash # Run social media backlog capture (first time) python run_competitive_intelligence.py --operation social-backlog --limit 20 # Run social media incremental sync (daily) python run_competitive_intelligence.py --operation social-incremental # Platform-specific operations python run_competitive_intelligence.py --operation social-backlog --platforms youtube --limit 30 python run_competitive_intelligence.py --operation social-incremental --platforms instagram ``` ### Analysis Operations ```bash # Analyze YouTube competitors python run_competitive_intelligence.py --operation platform-analysis --platforms youtube # Analyze Instagram competitors python run_competitive_intelligence.py --operation platform-analysis --platforms instagram ``` ## Rate Limiting & Anti-Detection ### YouTube - **API Quota**: 1-3 units per video (shared with HKIA) - **Rate Limiting**: 2 second delays between requests - **Proxy**: Optional but recommended for high-volume usage ### Instagram - **Rate Limiting**: Very aggressive (15-30 second delays) - **Hourly Limit**: 50 requests maximum per hour - **Extended Breaks**: 45-90 seconds every 5 requests - **Session Management**: Separate session files per competitor - **Proxy**: Highly recommended to avoid IP blocking ## Data Storage Structure ``` data/ ├── competitive_intelligence/ │ ├── ac_service_tech/ │ │ ├── backlog/ │ │ ├── incremental/ │ │ ├── analysis/ │ │ └── media/ │ ├── love2hvac/ │ ├── hvac_learning_solutions/ │ └── ... └── .state/ └── competitive/ ├── competitive_ac_service_tech_state.json └── ... ``` ## File Naming Convention ``` # YouTube competitor content competitive_ac_service_tech_backlog_20250828_140530.md competitive_love2hvac_incremental_20250828_141015.md # Instagram competitor content competitive_ac_service_tech_backlog_20250828_141530.md competitive_hvac_learning_solutions_incremental_20250828_142015.md ``` ## Automation & Scheduling ### Recommended Schedule ```bash # Morning sync (8:30 AM ADT) - after HKIA scraping 0 8 * * * cd /home/ben/dev/hvac-kia-content && python run_competitive_intelligence.py --operation social-incremental # Afternoon sync (1:30 PM ADT) - after HKIA scraping 0 13 * * * cd /home/ben/dev/hvac-kia-content && python run_competitive_intelligence.py --operation social-incremental # Weekly full analysis (Sundays at 9 AM) 0 9 * * 0 cd /home/ben/dev/hvac-kia-content && python run_competitive_intelligence.py --operation platform-analysis --platforms youtube 30 9 * * 0 cd /home/ben/dev/hvac-kia-content && python run_competitive_intelligence.py --operation platform-analysis --platforms instagram ``` ## Monitoring & Logs ```bash # Monitor logs tail -f logs/competitive_intelligence/competitive_orchestrator.log # Check specific scraper logs tail -f logs/competitive_intelligence/youtube_ac_service_tech.log tail -f logs/competitive_intelligence/instagram_love2hvac.log ``` ## Troubleshooting ### Common Issues 1. **YouTube API Quota Exceeded** ```bash # Check quota usage grep "quota" logs/competitive_intelligence/*.log # Reduce frequency or limits python run_competitive_intelligence.py --operation social-backlog --platforms youtube --limit 10 ``` 2. **Instagram Rate Limited** ```bash # Instagram automatically pauses for 1 hour when rate limited # Check logs for rate limit messages grep "rate limit" logs/competitive_intelligence/instagram*.log ``` 3. **Proxy Issues** ```bash # Test proxy connection python run_competitive_intelligence.py --operation test # Check proxy configuration echo $OXYLABS_USERNAME echo $OXYLABS_PROXY_ENDPOINT ``` 4. **Session Issues (Instagram)** ```bash # Clear competitive sessions rm data/.sessions/competitive_*.session # Re-run with fresh login python run_competitive_intelligence.py --operation social-incremental --platforms instagram ``` ## Performance Considerations ### Resource Usage - **Memory**: ~200-500MB per scraper during operation - **Storage**: ~10-50MB per competitor per month - **Network**: Respectful rate limiting prevents bandwidth issues ### Optimization Tips 1. Use proxy for production usage 2. Schedule during off-peak hours 3. Monitor API quota usage 4. Start with small limits and scale up 5. Use incremental sync for regular updates ## Security & Compliance ### Data Privacy - Only public content is scraped - No private accounts or personal data - Content stored locally only - GDPR compliant (public data only) ### Rate Limiting Compliance - Instagram: Very conservative limits - YouTube: API quota management - Proxy rotation prevents IP blocking - Respectful delays between requests ### Terms of Service - All scrapers comply with platform ToS - Public data only - No automated posting or interactions - Research/analysis use only ## Next Steps 1. **Phase 3**: Content Intelligence Analysis - AI-powered content analysis - Competitive positioning insights - Content gap identification - Publishing pattern analysis 2. **Future Enhancements** - LinkedIn competitive scraping - Twitter/X competitive monitoring - Automated competitive reports - Slack/email notifications ## Support For issues or questions: 1. Check logs in `logs/competitive_intelligence/` 2. Run test suite: `python test_social_media_competitive.py` 3. Test individual components: `python run_competitive_intelligence.py --operation test` ## Implementation Status ✅ **Phase 2 Complete**: Social Media Competitive Intelligence - ✅ YouTube competitive scrapers (4 channels) - ✅ Instagram competitive scrapers (3 accounts) - ✅ Integrated orchestrator - ✅ CLI commands - ✅ Rate limiting & anti-detection - ✅ State management - ✅ Content discovery & scraping - ✅ Analysis workflows - ✅ Documentation & testing **Ready for production use!**