Major enhancements to HKIA content analysis system: CRITICAL FIXES: • Fix engagement data parsing from markdown (Views/Likes/Comments now extracted correctly) • YouTube: 18.75% engagement rate working (16 views, 2 likes, 1 comment) • Instagram: 7.37% average engagement rate across 20 posts • High performer detection operational (1 YouTube + 20 Instagram above thresholds) CONTENT ANALYSIS SYSTEM: • Add Claude Haiku analyzer for HVAC content classification • Add engagement analyzer with source-specific algorithms • Add keyword extractor with 100+ HVAC-specific terms • Add intelligence aggregator for daily JSON reports • Add comprehensive unit test suite (73 tests, 90% coverage target) ARCHITECTURE: • Extend BaseScraper with optional AI analysis capabilities • Add content analysis orchestrator with CLI interface • Add competitive intelligence module structure • Maintain backward compatibility with existing scrapers INTELLIGENCE FEATURES: • Daily intelligence reports with strategic insights • Trending keyword analysis (813 refrigeration, 701 service mentions) • Content opportunity identification • Multi-source engagement benchmarking • HVAC-specific topic and product categorization PRODUCTION READY: • Claude Haiku API integration validated ($15-25/month estimated) • Graceful degradation when API unavailable • Comprehensive logging and error handling • State management for analytics tracking Ready for Phase 2: Competitive Intelligence Infrastructure 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
287 lines
No EOL
9.7 KiB
Markdown
287 lines
No EOL
9.7 KiB
Markdown
# HKIA Content Analysis & Competitive Intelligence Implementation Plan
|
|
|
|
## Project Overview
|
|
|
|
Add comprehensive content analysis and competitive intelligence capabilities to the existing HKIA content aggregation system. This will provide daily insights on content performance, trending topics, competitor analysis, and strategic content opportunities.
|
|
|
|
## Architecture Summary
|
|
|
|
### Current System Integration
|
|
- **Base**: Extend existing `BaseScraper` architecture and `ContentOrchestrator`
|
|
- **LLM**: Claude Haiku for cost-effective content classification
|
|
- **APIs**: Jina.ai (existing credits), Oxylabs (existing credits), Anthropic API
|
|
- **Competitors**: HVACR School (blog), AC Service Tech, Refrigeration Mentor, Love2HVAC, HVAC TV (social)
|
|
- **Strategy**: One-time backlog capture + daily incremental + weekly metadata refresh
|
|
|
|
## Implementation Phases
|
|
|
|
### Phase 1: Foundation (Week 1-2)
|
|
**Goal**: Set up content analysis framework for existing HKIA content
|
|
|
|
**Tasks**:
|
|
1. Create `src/content_analysis/` module structure
|
|
2. Implement `ClaudeHaikuAnalyzer` for content classification
|
|
3. Extend `BaseScraper` with analysis capabilities
|
|
4. Add analysis to existing scrapers (YouTube, Instagram, WordPress, etc.)
|
|
5. Create daily intelligence JSON output structure
|
|
|
|
**Deliverables**:
|
|
- Content classification for all existing HKIA sources
|
|
- Daily intelligence reports for HKIA content only
|
|
- Enhanced metadata in existing markdown files
|
|
|
|
### Phase 2: Competitor Infrastructure (Week 3-4)
|
|
**Goal**: Build competitor scraping and state management infrastructure
|
|
|
|
**Tasks**:
|
|
1. Create `src/competitive_intelligence/` module structure
|
|
2. Implement Oxylabs proxy integration
|
|
3. Build competitor scraper base classes
|
|
4. Create state management for incremental updates
|
|
5. Implement HVACR School blog scraper (backlog + incremental)
|
|
|
|
**Deliverables**:
|
|
- Competitor scraping framework
|
|
- HVACR School full backlog capture
|
|
- HVACR School daily incremental scraping
|
|
- Competitor state management system
|
|
|
|
### Phase 3: Social Media Competitor Scrapers (Week 5-6)
|
|
**Goal**: Implement social media competitor tracking
|
|
|
|
**Tasks**:
|
|
1. Build YouTube competitor scrapers (4 channels)
|
|
2. Build Instagram competitor scrapers (3 accounts)
|
|
3. Implement backlog capture commands
|
|
4. Create weekly metadata refresh system
|
|
5. Add competitor content to intelligence analysis
|
|
|
|
**Deliverables**:
|
|
- Complete competitor social media backlog
|
|
- Daily incremental social media scraping
|
|
- Weekly engagement metrics updates
|
|
- Unified competitor intelligence reports
|
|
|
|
### Phase 4: Advanced Analytics (Week 7-8)
|
|
**Goal**: Add trend detection and strategic insights
|
|
|
|
**Tasks**:
|
|
1. Implement trend detection algorithms
|
|
2. Build content gap analysis
|
|
3. Create competitive positioning analysis
|
|
4. Add SEO opportunity identification (using Jina.ai)
|
|
5. Generate weekly/monthly intelligence summaries
|
|
|
|
**Deliverables**:
|
|
- Advanced trend detection
|
|
- Content gap identification
|
|
- Strategic content recommendations
|
|
- Comprehensive intelligence dashboard data
|
|
|
|
### Phase 5: Production Deployment (Week 9-10)
|
|
**Goal**: Deploy to production with monitoring
|
|
|
|
**Tasks**:
|
|
1. Set up production environment variables
|
|
2. Create systemd services and timers
|
|
3. Integrate with existing NAS sync
|
|
4. Add monitoring and error handling
|
|
5. Create operational documentation
|
|
|
|
**Deliverables**:
|
|
- Production-ready deployment
|
|
- Automated daily/weekly schedules
|
|
- Monitoring and alerting
|
|
- Operational runbooks
|
|
|
|
## Technical Architecture
|
|
|
|
### Module Structure
|
|
```
|
|
src/
|
|
├── content_analysis/
|
|
│ ├── __init__.py
|
|
│ ├── claude_analyzer.py # Haiku-based content classification
|
|
│ ├── engagement_analyzer.py # Metrics and trending analysis
|
|
│ ├── keyword_extractor.py # SEO keyword identification
|
|
│ └── intelligence_aggregator.py # Daily intelligence JSON generation
|
|
├── competitive_intelligence/
|
|
│ ├── __init__.py
|
|
│ ├── backlog_capture/
|
|
│ │ ├── __init__.py
|
|
│ │ ├── hvacrschool_backlog.py
|
|
│ │ ├── youtube_competitor_backlog.py
|
|
│ │ └── instagram_competitor_backlog.py
|
|
│ ├── incremental_scrapers/
|
|
│ │ ├── __init__.py
|
|
│ │ ├── hvacrschool_incremental.py
|
|
│ │ ├── youtube_competitor_daily.py
|
|
│ │ └── instagram_competitor_daily.py
|
|
│ ├── metadata_refreshers/
|
|
│ │ ├── __init__.py
|
|
│ │ ├── youtube_engagement_updater.py
|
|
│ │ └── instagram_engagement_updater.py
|
|
│ └── analysis/
|
|
│ ├── __init__.py
|
|
│ ├── competitive_gap_analyzer.py
|
|
│ ├── trend_analyzer.py
|
|
│ └── strategic_insights.py
|
|
└── orchestrators/
|
|
├── __init__.py
|
|
├── content_analysis_orchestrator.py
|
|
└── competitive_intelligence_orchestrator.py
|
|
```
|
|
|
|
### Data Structure
|
|
```
|
|
data/
|
|
├── intelligence/
|
|
│ ├── daily/
|
|
│ │ └── hkia_intelligence_YYYY-MM-DD.json
|
|
│ ├── weekly/
|
|
│ │ └── hkia_weekly_intelligence_YYYY-MM-DD.json
|
|
│ └── monthly/
|
|
│ └── hkia_monthly_intelligence_YYYY-MM.json
|
|
├── competitor_content/
|
|
│ ├── hvacrschool/
|
|
│ │ ├── markdown_current/
|
|
│ │ ├── markdown_archives/
|
|
│ │ └── .state/
|
|
│ ├── acservicetech/
|
|
│ ├── refrigerationmentor/
|
|
│ ├── love2hvac/
|
|
│ └── hvactv/
|
|
└── .state/
|
|
├── competitor_hvacrschool_state.json
|
|
├── competitor_acservicetech_youtube_state.json
|
|
└── ...
|
|
```
|
|
|
|
### Environment Variables
|
|
```bash
|
|
# Content Analysis
|
|
ANTHROPIC_API_KEY=your_claude_key
|
|
JINA_AI_API_KEY=your_existing_jina_key
|
|
|
|
# Competitor Scraping
|
|
OXYLABS_RESIDENTIAL_PROXY_ENDPOINT=your_endpoint
|
|
OXYLABS_USERNAME=your_username
|
|
OXYLABS_PASSWORD=your_password
|
|
|
|
# Competitor Targets
|
|
COMPETITOR_YOUTUBE_CHANNELS=acservicetech,refrigerationmentor,love2hvac,hvactv
|
|
COMPETITOR_INSTAGRAM_ACCOUNTS=acservicetech,love2hvac
|
|
COMPETITOR_BLOGS=hvacrschool.com
|
|
```
|
|
|
|
### Production Schedule
|
|
```
|
|
Daily:
|
|
- 8:00 AM: HKIA content scraping (existing)
|
|
- 12:00 PM: HKIA content scraping (existing)
|
|
- 6:00 PM: Competitor incremental scraping
|
|
- 7:00 PM: Daily content analysis & intelligence generation
|
|
|
|
Weekly:
|
|
- Sunday 6:00 AM: Competitor metadata refresh
|
|
|
|
On-demand:
|
|
- Competitor backlog capture commands
|
|
- Force refresh commands
|
|
```
|
|
|
|
### systemd Services
|
|
```bash
|
|
# Daily content analysis
|
|
/etc/systemd/system/hkia-content-analysis.service
|
|
/etc/systemd/system/hkia-content-analysis.timer
|
|
|
|
# Daily competitor incremental
|
|
/etc/systemd/system/hkia-competitor-incremental.service
|
|
/etc/systemd/system/hkia-competitor-incremental.timer
|
|
|
|
# Weekly competitor metadata refresh
|
|
/etc/systemd/system/hkia-competitor-metadata-refresh.service
|
|
/etc/systemd/system/hkia-competitor-metadata-refresh.timer
|
|
|
|
# On-demand backlog capture
|
|
/etc/systemd/system/hkia-competitor-backlog.service
|
|
```
|
|
|
|
## Cost Estimates
|
|
|
|
**Monthly Operational Costs:**
|
|
- Claude Haiku API: $15-25/month (content classification)
|
|
- Jina.ai: $0 (existing credits)
|
|
- Oxylabs: $0 (existing credits)
|
|
- **Total: $15-25/month**
|
|
|
|
## Success Metrics
|
|
|
|
1. **Content Intelligence**: Daily classification of 100% HKIA content
|
|
2. **Competitive Coverage**: Track 100% of competitor new content within 24 hours
|
|
3. **Strategic Insights**: Generate 3-5 actionable content opportunities daily
|
|
4. **Performance**: All analysis completed within 2-hour daily window
|
|
5. **Cost Efficiency**: Stay under $30/month operational costs
|
|
|
|
## Risk Mitigation
|
|
|
|
1. **Rate Limiting**: Implement exponential backoff and respect competitor ToS
|
|
2. **API Costs**: Monitor Claude Haiku usage, implement batching for efficiency
|
|
3. **Proxy Reliability**: Failover logic for Oxylabs proxy issues
|
|
4. **Data Storage**: Automated cleanup of old intelligence data
|
|
5. **System Load**: Schedule analysis during low-traffic periods
|
|
|
|
## Commands for Implementation
|
|
|
|
### Development Setup
|
|
```bash
|
|
# Add new dependencies
|
|
uv add anthropic jina-ai requests-oauthlib
|
|
|
|
# Create module structure
|
|
mkdir -p src/content_analysis src/competitive_intelligence/{backlog_capture,incremental_scrapers,metadata_refreshers,analysis} src/orchestrators
|
|
|
|
# Test content analysis on existing data
|
|
uv run python test_content_analysis.py
|
|
|
|
# Test competitor scraping
|
|
uv run python test_competitor_scraping.py
|
|
```
|
|
|
|
### Backlog Capture (One-time)
|
|
```bash
|
|
# Capture HVACR School full blog
|
|
uv run python -m src.competitive_intelligence.backlog_capture --competitor hvacrschool
|
|
|
|
# Capture competitor social media backlogs
|
|
uv run python -m src.competitive_intelligence.backlog_capture --competitor acservicetech --platforms youtube,instagram
|
|
|
|
# Force re-capture if needed
|
|
uv run python -m src.competitive_intelligence.backlog_capture --force
|
|
```
|
|
|
|
### Production Operations
|
|
```bash
|
|
# Manual intelligence generation
|
|
uv run python -m src.orchestrators.content_analysis_orchestrator
|
|
|
|
# Manual competitor incremental scraping
|
|
uv run python -m src.orchestrators.competitive_intelligence_orchestrator --mode incremental
|
|
|
|
# Weekly metadata refresh
|
|
uv run python -m src.orchestrators.competitive_intelligence_orchestrator --mode metadata-refresh
|
|
|
|
# View latest intelligence
|
|
cat data/intelligence/daily/hkia_intelligence_$(date +%Y-%m-%d).json | jq
|
|
```
|
|
|
|
## Next Steps
|
|
|
|
1. **Immediate**: Begin Phase 1 implementation with content analysis framework
|
|
2. **Week 1**: Set up Claude Haiku integration and test on existing HKIA content
|
|
3. **Week 2**: Complete content classification for all current sources
|
|
4. **Week 3**: Begin competitor infrastructure development
|
|
5. **Week 4**: Deploy HVACR School competitor tracking
|
|
|
|
This plan provides a structured approach to implementing comprehensive content analysis and competitive intelligence while leveraging existing infrastructure and maintaining cost efficiency. |