hvac-marketing-skills/COMPENDIUM_INTEGRATION.md

# Compendium Integration Contract

> Single source of truth for all Compendium platform access from HVAC marketing skills.
> Skills reference this file instead of embedding raw SQL/API snippets.

## Quick Reference

| Service | Endpoint | Auth | Rate Limit |
|---------|----------|------|------------|
| PostgreSQL (via MCP) | `mcp__postgres__execute_sql` | MCP connection | No hard limit |
| Search Router | `http://192.168.10.249:30099` | None (internal) | No hard limit |
| Scrape Router | `http://192.168.10.249:30098` | None (internal) | ~10K pages/day |
| Classification API | `http://192.168.10.249:30080/api/v2/content-classification/` | None (internal) | 100 req/hr |
| Live Browse | `mcp__playwright__browser_*` | MCP connection | Session-based |
| Zen Analysis | `mcp__zen__analyze` / `mcp__zen__thinkdeep` | MCP connection | No hard limit |

## Health Checks

```bash
# PostgreSQL
mcp__postgres__execute_sql: "SELECT 1"

# Search Router
curl -s http://192.168.10.249:30099/health

# Scrape Router
curl -s http://192.168.10.249:30098/health

# Classification API
curl -s http://192.168.10.249:30080/api/v2/content-classification/health/
```

---

## Tool Tiers

Skills declare which tiers they use via `compendium.tools` in YAML frontmatter.

### Tier: DB Query (`db`)
**Tool**: `mcp__postgres__execute_sql`

Direct SQL access to the Compendium database. Use parameterized queries.

### Tier: Live Browse (`browse`)
**Tool**: `mcp__playwright__browser_*`

View live social profiles, competitor websites, current SERP results. Use for visual audits and real-time data.

### Tier: Search (`search`)
**Tools**: Claude Code `WebSearch` + Search Router API

For competitive research, content discovery, trend validation. Prefer `WebSearch` for general queries; use Search Router for HVAC-specific indexed content.

### Tier: Scrape (`scrape`)
**Tool**: Scrape Router API (`http://192.168.10.249:30098`)

Full page content extraction and competitor copy harvesting.

```bash
# Basic scrape
curl -X POST http://192.168.10.249:30098/scrape \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example-hvac.com", "extract_text": true}'
```

### Tier: Classify (`classify`)
**Tool**: Classification API

Classify content against HVAC taxonomy (8 dimensions).

```bash
curl -X POST http://192.168.10.249:30080/api/v2/content-classification/classify/ \
  -H "Content-Type: application/json" \
  -d '{"text": "content to classify", "content_type": "article"}'
```

### Tier: Analyze (`analyze`)
**Tools**: `mcp__zen__analyze` / `mcp__zen__thinkdeep`

Deep analysis of content strategy, competitive positioning, and complex marketing decisions.

### Tier: Web Fetch (`fetch`)
**Tool**: Claude Code `WebFetch`

Quick page reads, API documentation lookup, lightweight scraping.

---

## Reusable Query Templates

### Content Discovery

```sql
-- trending-topics: What topics are gaining traction this week
SELECT topic, article_count, avg_engagement, trend_direction
FROM mv_topic_trends_weekly
WHERE trend_direction = 'rising'
ORDER BY article_count DESC
LIMIT 20;

-- topic-saturation: Find underserved content opportunities
SELECT topic, total_articles, avg_quality_score, saturation_level
FROM mv_topic_saturation
WHERE saturation_level IN ('low', 'medium')
ORDER BY total_articles ASC
LIMIT 20;

-- content-freshness: Find content that needs updating or repurposing
SELECT title, source_type, published_date, freshness_status
FROM mv_content_freshness
WHERE type != 'email'
  AND freshness_status IN ('stale', 'aging')
ORDER BY published_date ASC
LIMIT 20;
```

### Quotes & Statistics

```sql
-- notable-quotes: Source real quotes for content
SELECT speaker_name, quote_text, source_title, source_type, published_date
FROM mv_notable_quotes
WHERE speaker_name IS NOT NULL
ORDER BY published_date DESC
LIMIT 20;

-- industry-statistics: Data points for authority content
SELECT stat_text, stat_value, source_name, source_url, source_verified, collected_date
FROM intelligence.statistics
WHERE source_verified = true
ORDER BY collected_date DESC
LIMIT 30;

-- unverified-statistics: Use with "industry estimate" label
SELECT stat_text, stat_value, source_name, collected_date
FROM intelligence.statistics
WHERE source_verified = false OR source_verified IS NULL
ORDER BY collected_date DESC
LIMIT 20;
```

### Influencer & Brand Tracking

```sql
-- influencer-activity: Track HVAC industry voices
SELECT influencer_name, platform, follower_count, post_count, avg_engagement
FROM mv_influencer_activity
ORDER BY avg_engagement DESC
LIMIT 20;

-- brand-content: Track brand mentions and content
SELECT brand_name, content_count, avg_sentiment, last_mentioned
FROM mv_brand_content_tracker
ORDER BY content_count DESC
LIMIT 20;
```

### Social Media Insights

```sql
-- instagram-posts: Enriched Instagram content data
SELECT username, caption, like_count, comment_count, posted_at, content_type
FROM v_instagram_posts_enriched
ORDER BY like_count DESC
LIMIT 20;

-- optimal-posting: Best times to post by platform
SELECT platform, day_of_week, hour_of_day, avg_engagement
FROM mv_optimal_posting_times
ORDER BY avg_engagement DESC;
```

### Content Classification

```sql
-- classified-content: Browse classified content by type and quality
SELECT title, source_type, technical_level, audience_segment,
       quality_score, confidence_score, classification_date
FROM v_content_classified
WHERE confidence_score >= 0.6
ORDER BY classification_date DESC
LIMIT 30;
```

### Contractor Discovery

```sql
-- contractors: HVAC contractor data for local marketing
SELECT company_name, city, state, phone, website, rating, review_count
FROM hvac.contractors
WHERE website IS NOT NULL
ORDER BY review_count DESC
LIMIT 20;
```

---

## Data Quality Rules

1. **Always filter `mv_content_freshness`** with `type != 'email'` — email content is internal
2. **Treat `intelligence.statistics` as unverified** unless `source_verified = true`
3. **Require `confidence_score >= 0.6`** for classification outputs from `v_content_classified`
4. **Quote attribution**: Always include `speaker_name`, `source_title`, and `published_date` from `mv_notable_quotes`
5. **Freshness**: Check `published_date` or `collected_date` — data older than 12 months should be flagged

### Confidence Labels

Use these labels when citing Compendium data in generated content:

| Label | When to use |
|-------|-------------|
| **Verified** | `source_verified = true` in `intelligence.statistics` |
| **Industry estimate** | Statistics without verification, from reputable sources |
| **Unverified — needs review** | `source_verified = false` or unknown provenance |

---

## Fallback Behavior

When a Compendium service is unavailable, skills fall back gracefully:

| Service Down | Fallback Strategy |
|-------------|-------------------|
| PostgreSQL | Use Claude Code `WebSearch` for data; note "live data unavailable" |
| Search Router | Use Claude Code `WebSearch` directly |
| Scrape Router | Use Claude Code `WebFetch` for page content |
| Classification API | Skip classification; use manual assessment |
| Playwright MCP | Describe what would be checked; ask user to verify manually |
| Zen MCP | Use Claude's built-in analysis capabilities |

**Every skill must work standalone.** Compendium integration enhances output quality but is never a hard blocker (except skills marked `mode: required`).

---

## Search Router API

```bash
# Search with default backends
curl "http://192.168.10.249:30099/search?q=hvac+efficiency+tips&format=json"

# Search with specific backend
curl "http://192.168.10.249:30099/search?q=hvac+contractor+reviews&backend=jina&format=json"

# Available backends: searxng-direct, searxng-proxied, jina, exa
```

---

## Scrape Router API

```bash
# Basic text extraction
curl -X POST http://192.168.10.249:30098/scrape \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/page", "extract_text": true}'

# With metadata extraction
curl -X POST http://192.168.10.249:30098/scrape \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com", "extract_text": true, "extract_metadata": true}'

# Chain: crawlee → camoufox → crawl4ai (automatic fallback)
```

---

## Classification API

```bash
# Classify content
curl -X POST http://192.168.10.249:30080/api/v2/content-classification/classify/ \
  -H "Content-Type: application/json" \
  -d '{
    "text": "How to diagnose a failed compressor...",
    "content_type": "article",
    "return_dimensions": true
  }'

# 8 classification dimensions:
# - topic_category, technical_level, audience_segment, content_format
# - quality_score, engagement_potential, commercial_intent, sentiment
```