Major enhancements to HKIA content analysis system: CRITICAL FIXES: • Fix engagement data parsing from markdown (Views/Likes/Comments now extracted correctly) • YouTube: 18.75% engagement rate working (16 views, 2 likes, 1 comment) • Instagram: 7.37% average engagement rate across 20 posts • High performer detection operational (1 YouTube + 20 Instagram above thresholds) CONTENT ANALYSIS SYSTEM: • Add Claude Haiku analyzer for HVAC content classification • Add engagement analyzer with source-specific algorithms • Add keyword extractor with 100+ HVAC-specific terms • Add intelligence aggregator for daily JSON reports • Add comprehensive unit test suite (73 tests, 90% coverage target) ARCHITECTURE: • Extend BaseScraper with optional AI analysis capabilities • Add content analysis orchestrator with CLI interface • Add competitive intelligence module structure • Maintain backward compatibility with existing scrapers INTELLIGENCE FEATURES: • Daily intelligence reports with strategic insights • Trending keyword analysis (813 refrigeration, 701 service mentions) • Content opportunity identification • Multi-source engagement benchmarking • HVAC-specific topic and product categorization PRODUCTION READY: • Claude Haiku API integration validated ($15-25/month estimated) • Graceful degradation when API unavailable • Comprehensive logging and error handling • State management for analytics tracking Ready for Phase 2: Competitive Intelligence Infrastructure 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
9.7 KiB
HKIA Content Analysis & Competitive Intelligence Implementation Plan
Project Overview
Add comprehensive content analysis and competitive intelligence capabilities to the existing HKIA content aggregation system. This will provide daily insights on content performance, trending topics, competitor analysis, and strategic content opportunities.
Architecture Summary
Current System Integration
- Base: Extend existing
BaseScraperarchitecture andContentOrchestrator - LLM: Claude Haiku for cost-effective content classification
- APIs: Jina.ai (existing credits), Oxylabs (existing credits), Anthropic API
- Competitors: HVACR School (blog), AC Service Tech, Refrigeration Mentor, Love2HVAC, HVAC TV (social)
- Strategy: One-time backlog capture + daily incremental + weekly metadata refresh
Implementation Phases
Phase 1: Foundation (Week 1-2)
Goal: Set up content analysis framework for existing HKIA content
Tasks:
- Create
src/content_analysis/module structure - Implement
ClaudeHaikuAnalyzerfor content classification - Extend
BaseScraperwith analysis capabilities - Add analysis to existing scrapers (YouTube, Instagram, WordPress, etc.)
- Create daily intelligence JSON output structure
Deliverables:
- Content classification for all existing HKIA sources
- Daily intelligence reports for HKIA content only
- Enhanced metadata in existing markdown files
Phase 2: Competitor Infrastructure (Week 3-4)
Goal: Build competitor scraping and state management infrastructure
Tasks:
- Create
src/competitive_intelligence/module structure - Implement Oxylabs proxy integration
- Build competitor scraper base classes
- Create state management for incremental updates
- Implement HVACR School blog scraper (backlog + incremental)
Deliverables:
- Competitor scraping framework
- HVACR School full backlog capture
- HVACR School daily incremental scraping
- Competitor state management system
Phase 3: Social Media Competitor Scrapers (Week 5-6)
Goal: Implement social media competitor tracking
Tasks:
- Build YouTube competitor scrapers (4 channels)
- Build Instagram competitor scrapers (3 accounts)
- Implement backlog capture commands
- Create weekly metadata refresh system
- Add competitor content to intelligence analysis
Deliverables:
- Complete competitor social media backlog
- Daily incremental social media scraping
- Weekly engagement metrics updates
- Unified competitor intelligence reports
Phase 4: Advanced Analytics (Week 7-8)
Goal: Add trend detection and strategic insights
Tasks:
- Implement trend detection algorithms
- Build content gap analysis
- Create competitive positioning analysis
- Add SEO opportunity identification (using Jina.ai)
- Generate weekly/monthly intelligence summaries
Deliverables:
- Advanced trend detection
- Content gap identification
- Strategic content recommendations
- Comprehensive intelligence dashboard data
Phase 5: Production Deployment (Week 9-10)
Goal: Deploy to production with monitoring
Tasks:
- Set up production environment variables
- Create systemd services and timers
- Integrate with existing NAS sync
- Add monitoring and error handling
- Create operational documentation
Deliverables:
- Production-ready deployment
- Automated daily/weekly schedules
- Monitoring and alerting
- Operational runbooks
Technical Architecture
Module Structure
src/
├── content_analysis/
│ ├── __init__.py
│ ├── claude_analyzer.py # Haiku-based content classification
│ ├── engagement_analyzer.py # Metrics and trending analysis
│ ├── keyword_extractor.py # SEO keyword identification
│ └── intelligence_aggregator.py # Daily intelligence JSON generation
├── competitive_intelligence/
│ ├── __init__.py
│ ├── backlog_capture/
│ │ ├── __init__.py
│ │ ├── hvacrschool_backlog.py
│ │ ├── youtube_competitor_backlog.py
│ │ └── instagram_competitor_backlog.py
│ ├── incremental_scrapers/
│ │ ├── __init__.py
│ │ ├── hvacrschool_incremental.py
│ │ ├── youtube_competitor_daily.py
│ │ └── instagram_competitor_daily.py
│ ├── metadata_refreshers/
│ │ ├── __init__.py
│ │ ├── youtube_engagement_updater.py
│ │ └── instagram_engagement_updater.py
│ └── analysis/
│ ├── __init__.py
│ ├── competitive_gap_analyzer.py
│ ├── trend_analyzer.py
│ └── strategic_insights.py
└── orchestrators/
├── __init__.py
├── content_analysis_orchestrator.py
└── competitive_intelligence_orchestrator.py
Data Structure
data/
├── intelligence/
│ ├── daily/
│ │ └── hkia_intelligence_YYYY-MM-DD.json
│ ├── weekly/
│ │ └── hkia_weekly_intelligence_YYYY-MM-DD.json
│ └── monthly/
│ └── hkia_monthly_intelligence_YYYY-MM.json
├── competitor_content/
│ ├── hvacrschool/
│ │ ├── markdown_current/
│ │ ├── markdown_archives/
│ │ └── .state/
│ ├── acservicetech/
│ ├── refrigerationmentor/
│ ├── love2hvac/
│ └── hvactv/
└── .state/
├── competitor_hvacrschool_state.json
├── competitor_acservicetech_youtube_state.json
└── ...
Environment Variables
# Content Analysis
ANTHROPIC_API_KEY=your_claude_key
JINA_AI_API_KEY=your_existing_jina_key
# Competitor Scraping
OXYLABS_RESIDENTIAL_PROXY_ENDPOINT=your_endpoint
OXYLABS_USERNAME=your_username
OXYLABS_PASSWORD=your_password
# Competitor Targets
COMPETITOR_YOUTUBE_CHANNELS=acservicetech,refrigerationmentor,love2hvac,hvactv
COMPETITOR_INSTAGRAM_ACCOUNTS=acservicetech,love2hvac
COMPETITOR_BLOGS=hvacrschool.com
Production Schedule
Daily:
- 8:00 AM: HKIA content scraping (existing)
- 12:00 PM: HKIA content scraping (existing)
- 6:00 PM: Competitor incremental scraping
- 7:00 PM: Daily content analysis & intelligence generation
Weekly:
- Sunday 6:00 AM: Competitor metadata refresh
On-demand:
- Competitor backlog capture commands
- Force refresh commands
systemd Services
# Daily content analysis
/etc/systemd/system/hkia-content-analysis.service
/etc/systemd/system/hkia-content-analysis.timer
# Daily competitor incremental
/etc/systemd/system/hkia-competitor-incremental.service
/etc/systemd/system/hkia-competitor-incremental.timer
# Weekly competitor metadata refresh
/etc/systemd/system/hkia-competitor-metadata-refresh.service
/etc/systemd/system/hkia-competitor-metadata-refresh.timer
# On-demand backlog capture
/etc/systemd/system/hkia-competitor-backlog.service
Cost Estimates
Monthly Operational Costs:
- Claude Haiku API: $15-25/month (content classification)
- Jina.ai: $0 (existing credits)
- Oxylabs: $0 (existing credits)
- Total: $15-25/month
Success Metrics
- Content Intelligence: Daily classification of 100% HKIA content
- Competitive Coverage: Track 100% of competitor new content within 24 hours
- Strategic Insights: Generate 3-5 actionable content opportunities daily
- Performance: All analysis completed within 2-hour daily window
- Cost Efficiency: Stay under $30/month operational costs
Risk Mitigation
- Rate Limiting: Implement exponential backoff and respect competitor ToS
- API Costs: Monitor Claude Haiku usage, implement batching for efficiency
- Proxy Reliability: Failover logic for Oxylabs proxy issues
- Data Storage: Automated cleanup of old intelligence data
- System Load: Schedule analysis during low-traffic periods
Commands for Implementation
Development Setup
# Add new dependencies
uv add anthropic jina-ai requests-oauthlib
# Create module structure
mkdir -p src/content_analysis src/competitive_intelligence/{backlog_capture,incremental_scrapers,metadata_refreshers,analysis} src/orchestrators
# Test content analysis on existing data
uv run python test_content_analysis.py
# Test competitor scraping
uv run python test_competitor_scraping.py
Backlog Capture (One-time)
# Capture HVACR School full blog
uv run python -m src.competitive_intelligence.backlog_capture --competitor hvacrschool
# Capture competitor social media backlogs
uv run python -m src.competitive_intelligence.backlog_capture --competitor acservicetech --platforms youtube,instagram
# Force re-capture if needed
uv run python -m src.competitive_intelligence.backlog_capture --force
Production Operations
# Manual intelligence generation
uv run python -m src.orchestrators.content_analysis_orchestrator
# Manual competitor incremental scraping
uv run python -m src.orchestrators.competitive_intelligence_orchestrator --mode incremental
# Weekly metadata refresh
uv run python -m src.orchestrators.competitive_intelligence_orchestrator --mode metadata-refresh
# View latest intelligence
cat data/intelligence/daily/hkia_intelligence_$(date +%Y-%m-%d).json | jq
Next Steps
- Immediate: Begin Phase 1 implementation with content analysis framework
- Week 1: Set up Claude Haiku integration and test on existing HKIA content
- Week 2: Complete content classification for all current sources
- Week 3: Begin competitor infrastructure development
- Week 4: Deploy HVACR School competitor tracking
This plan provides a structured approach to implementing comprehensive content analysis and competitive intelligence while leveraging existing infrastructure and maintaining cost efficiency.