feat: add GEO research data and platform-specific ranking factors

Add Princeton GEO study (KDD 2024) 9-method table with exact visibility percentages to SKILL.md. Add AI bot robots.txt configuration. Add keyword stuffing warning (-10% visibility). Add platform-ranking-factors.md reference with per-platform details: Google AI Overviews (5-stage pipeline), ChatGPT (content-answer fit 55%, 30-day freshness 3.2x), Perplexity (3-layer RAG, FAQ Schema priority), Copilot (Bing index + MS ecosystem), Claude (Brave Search, 38K:1 crawl-to-refer ratio). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 17:14:33 -08:00 · 2026-02-18 17:14:33 -08:00 · 5436b34b98
commit 5436b34b98
parent ce8dd3b41a
2 changed files with 308 additions and 4 deletions
--- a/skills/ai-seo/SKILL.md
+++ b/skills/ai-seo/SKILL.md
@ -49,7 +49,9 @@ Gather this context (ask if not provided):
 | **Perplexity** | Always cites sources with links | Favors authoritative, recent, well-structured content |
 | **Gemini** | Google's AI assistant | Pulls from Google index + Knowledge Graph |
 | **Copilot** | Bing-powered AI search | Bing index + authoritative sources |
-| **Claude** | No live search (web search when enabled) | Training data + search results when available |
+| **Claude** | Brave Search (when enabled) | Training data + Brave search results |
+
+For detailed ranking factors per platform, see [references/platform-ranking-factors.md](references/platform-ranking-factors.md).

 ### Key Difference from Traditional SEO

@ -110,6 +112,25 @@ For each priority page, verify:
 | Expert attribution (author name, credentials)? | |
 | Recently updated (within 6 months)? | |
 | Heading structure matches query patterns? | |
+| AI bots allowed in robots.txt? | |
+
+### Step 4: AI Bot Access Check
+
+Verify your robots.txt allows AI crawlers. If these bots are blocked, AI platforms can't cite you:
+
+```
+# AI bots to allow in robots.txt
+User-agent: GPTBot           # OpenAI (ChatGPT)
+User-agent: ChatGPT-User     # ChatGPT with browsing
+User-agent: PerplexityBot    # Perplexity
+User-agent: ClaudeBot        # Anthropic (Claude)
+User-agent: anthropic-ai     # Anthropic (Claude)
+User-agent: Google-Extended   # Google AI (Gemini, AI Overviews)
+User-agent: Bingbot          # Microsoft Copilot
+Allow: /
+```
+
+**Note:** Some companies block AI bots to prevent training on their content. That's a valid business decision — but if you block them, you won't get cited. You can selectively block training-only bots while allowing search bots.

 ---

@ -147,15 +168,31 @@ For detailed templates for each block type, see [references/content-patterns.md]

 ### Pillar 2: Authority — Make Content Citable

-AI systems prefer sources they can trust. Build citation-worthiness:
+AI systems prefer sources they can trust. Build citation-worthiness.

-**Statistics and data** (40%+ citation boost)
+**The Princeton GEO research** (KDD 2024, studied across Perplexity.ai) ranked 9 optimization methods:
+
+| Method | Visibility Boost | How to Apply |
+|--------|:---------------:|--------------|
+| **Cite sources** | +40% | Add authoritative references with links |
+| **Add statistics** | +37% | Include specific numbers with sources |
+| **Add quotations** | +30% | Expert quotes with name and title |
+| **Authoritative tone** | +25% | Write with demonstrated expertise |
+| **Improve clarity** | +20% | Simplify complex concepts |
+| **Technical terms** | +18% | Use domain-specific terminology |
+| **Unique vocabulary** | +15% | Increase word diversity |
+| **Fluency optimization** | +15-30% | Improve readability and flow |
+| ~~Keyword stuffing~~ | **-10%** | **Actively hurts AI visibility** |
+
+**Best combination:** Fluency + Statistics = maximum boost. Low-ranking sites benefit even more — up to 115% visibility increase with citations.
+
+**Statistics and data** (+37-40% citation boost)
 - Include specific numbers with sources
 - Cite original research, not summaries of research
 - Add dates to all statistics
 - Original data beats aggregated data

-**Expert attribution** (25-30% citation boost)
+**Expert attribution** (+25-30% citation boost)
 - Named authors with credentials
 - Expert quotes with titles and organizations
 - "According to [Source]" framing for claims
@ -324,6 +361,8 @@ Monthly manual check:
 - **Gating all content** — AI can't access gated content. Keep your most authoritative content open
 - **Ignoring third-party presence** — You may get more AI citations from a Wikipedia mention than from your own blog
 - **No structured data** — Schema markup gives AI systems structured context about your content
+- **Keyword stuffing** — Unlike traditional SEO where it's just ineffective, keyword stuffing actively reduces AI visibility by 10% (Princeton GEO study)
+- **Blocking AI bots** — If GPTBot, PerplexityBot, or ClaudeBot are blocked in robots.txt, those platforms can't cite you
 - **Generic content without data** — "We're the best" won't get cited. "Our customers see 3x improvement in [metric]" will
 - **Forgetting to monitor** — You can't improve what you don't measure. Check AI visibility monthly at minimum

--- a/skills/ai-seo/references/platform-ranking-factors.md
+++ b/skills/ai-seo/references/platform-ranking-factors.md
@ -0,0 +1,265 @@
+# Platform-Specific Ranking Factors
+
+How each AI search platform selects sources and what to optimize for each.
+
+Sources: Princeton GEO study (KDD 2024), SE Ranking study (129K domains), Ziptie content-answer fit analysis (400K pages).
+
+---
+
+## Quick Reference
+
+| Platform | Primary Index | Key Factor | Unique Requirement |
+|----------|--------------|------------|-------------------|
+| **Google AI Overviews** | Google | E-E-A-T + structured data | Knowledge Graph presence |
+| **ChatGPT** | Bing-based web | Domain authority + freshness | Content-answer fit |
+| **Perplexity** | Own + Google | Semantic relevance | FAQ Schema, PDF hosting |
+| **Copilot** | Bing | Bing indexing | Microsoft ecosystem presence |
+| **Claude** | Brave Search | Factual density | Brave Search indexing |
+| **Gemini** | Google | Google index + Knowledge Graph | Structured data |
+
+---
+
+## Google AI Overviews
+
+Google's AI Overviews synthesize answers from multiple sources using a 5-stage pipeline.
+
+### How Source Selection Works
+
+1. **Retrieval** — Identify candidate sources from Google index
+2. **Semantic ranking** — Evaluate topical relevance
+3. **LLM re-ranking** — Assess contextual fit using Gemini
+4. **E-E-A-T evaluation** — Filter for expertise, authority, trust
+5. **Data fusion** — Synthesize from multiple sources with citations
+
+### Key Stats
+
+| Signal | Impact |
+|--------|--------|
+| Authoritative citations in content | +132% visibility |
+| Authoritative tone | +89% visibility |
+| Structured data (Schema) | +30-40% visibility |
+| Overlap with traditional Top 10 | Only 15% (AI Overviews cite different pages) |
+
+### What to Optimize
+
+- Implement comprehensive Schema markup (Article, FAQPage, HowTo, Product)
+- Build topical authority with content clusters and internal linking
+- Include authoritative citations and references in content
+- Add E-E-A-T signals (author bios, credentials, experience)
+- Target informational "how-to" and "what is" queries
+- Ensure content is in Google's Knowledge Graph (Wikipedia helps)
+
+---
+
+## ChatGPT (with Search)
+
+ChatGPT uses a Bing-based web index for real-time search, combined with its training data.
+
+### How Source Selection Works
+
+Two-phase system:
+1. **Pre-training knowledge** — Built from training data (Wikipedia, books, web)
+2. **Real-time retrieval** — Web browsing for current information
+
+### Ranking Factor Weights (SE Ranking Study, 129K Domains)
+
+| Factor | Weight |
+|--------|--------|
+| Authority & credibility | ~40% |
+| Content quality & utility | ~35% |
+| Platform trust | ~25% |
+
+### Content-Answer Fit Analysis (400K Pages Study)
+
+| Factor | Relevance |
+|--------|-----------|
+| **Content-answer fit** | 55% — most important; match ChatGPT's response style |
+| **On-page structure** | 14% — clear headings, formatting |
+| **Domain authority** | 12% — helps retrieval, not citation |
+| **Query relevance** | 12% — match user intent |
+| **Content consensus** | 7% — agreement among sources |
+
+### Key Stats
+
+| Metric | Impact |
+|--------|--------|
+| >350K referring domains | 8.4 average citations |
+| Domain trust score 97-100 | 8.4 citations (vs 6 for 91-96) |
+| Content updated within 30 days | 3.2x more citations |
+| Branded vs third-party domains | Branded cited 11.1 points more |
+
+### Top Citation Sources
+
+1. Wikipedia (7.8%)
+2. Reddit (1.8%)
+3. Forbes (1.1%)
+4. Brand official sites (variable)
+5. Academic sources (variable)
+
+### What to Optimize
+
+- Build a strong backlink profile (quality over quantity, >350K referring domains is elite)
+- Update content frequently (within 30 days for competitive topics)
+- Match ChatGPT's conversational answer style in your content
+- Include verifiable statistics with citations
+- Use clear H1/H2/H3 heading structure
+- Build high domain trust score
+
+---
+
+## Perplexity AI
+
+Perplexity always cites its sources with links. It uses Retrieval-Augmented Generation (RAG) with a 3-layer reranking system.
+
+### How Source Selection Works
+
+1. **Layer 1 (L1)** — Basic relevance retrieval
+2. **Layer 2 (L2)** — Traditional ranking factors scoring
+3. **Layer 3 (L3)** — ML models for quality evaluation (can discard entire result sets)
+
+### Key Ranking Signals
+
+| Signal | Details |
+|--------|---------|
+| Authoritative domain lists | Manual lists: Amazon, GitHub, academic sites get inherent boost |
+| Freshness | Time decay algorithm; new content evaluated quickly |
+| Semantic relevance | Content similarity to query (not keyword matching) |
+| Topical weighting | Tech, AI, Science topics get visibility multipliers |
+| Early engagement | First clicks on new posts significantly boost visibility |
+
+### Unique to Perplexity
+
+- **FAQ Schema (JSON-LD)** — Pages with FAQ blocks are cited more often
+- **PDF documents** — Publicly hosted PDFs are prioritized for citation
+- **Content velocity** — Speed of publishing matters more than keyword density
+- **Semantic payloads** — Clear, atomic paragraphs preferred (self-contained)
+
+### What to Optimize
+
+- Allow PerplexityBot in robots.txt
+- Implement FAQPage Schema markup
+- Create publicly accessible PDF resources (whitepapers, guides)
+- Use Article schema with timestamps
+- Focus on semantic relevance over keywords
+- Build topical authority in your niche
+- Write clear, self-contained paragraphs
+
+---
+
+## Microsoft Copilot
+
+Copilot is integrated into Edge, Windows, Microsoft 365, and Bing Search. It uses the **Bing Index** as its primary data source.
+
+### Key Ranking Signals
+
+| Signal | Details |
+|--------|---------|
+| Bing indexing | Must be indexed by Bing (required baseline) |
+| Microsoft ecosystem | LinkedIn, GitHub mentions provide a boost |
+| Page speed | < 2 seconds load time |
+| Schema markup | Helps Copilot understand content context |
+| Entity clarity | Clear definitions of entities and concepts |
+
+### What to Optimize
+
+- Submit site to Bing Webmaster Tools
+- Use IndexNow for faster indexing of new content
+- Optimize page speed (< 2 seconds)
+- Write clear entity definitions in content
+- Build presence on LinkedIn and GitHub
+- Ensure Bingbot can crawl all important pages
+
+---
+
+## Claude AI
+
+Claude uses **Brave Search** (not Google or Bing) when web search is enabled.
+
+### Key Characteristics
+
+| Signal | Details |
+|--------|---------|
+| Brave Index | Must be indexed by Brave Search |
+| Factual density | Data-rich content strongly preferred |
+| Structural clarity | Easy to extract information |
+| Source authority | Trustworthy, well-sourced content |
+| Selectivity | Crawl-to-refer ratio of 38,065:1 (extremely selective) |
+
+Claude consumes vast amounts of content but cites very selectively. Quality and relevance are critical.
+
+### What to Optimize
+
+- Ensure Brave Search can find your content
+- Allow ClaudeBot and anthropic-ai in robots.txt
+- Create high factual density content (specific numbers, sources)
+- Use clear, extractable structure
+- Cite authoritative sources
+- Focus on being the most factually accurate source for your topic
+
+---
+
+## robots.txt Configuration
+
+Allow all major AI bots:
+
+```
+# Search engine bots
+User-agent: Googlebot
+Allow: /
+
+User-agent: Bingbot
+Allow: /
+
+# AI search bots
+User-agent: GPTBot
+Allow: /
+
+User-agent: ChatGPT-User
+Allow: /
+
+User-agent: PerplexityBot
+Allow: /
+
+User-agent: ClaudeBot
+Allow: /
+
+User-agent: anthropic-ai
+Allow: /
+
+User-agent: Google-Extended
+Allow: /
+
+# Sitemap
+Sitemap: https://example.com/sitemap.xml
+```
+
+### Selective Blocking
+
+If you want to allow AI search citation but block AI training:
+- **GPTBot** — Used by OpenAI for both search and training. Blocking prevents ChatGPT citation.
+- **Google-Extended** — Controls Gemini/AI Overviews usage. Blocking this doesn't affect regular Google Search.
+- **CCBot** — Used by Common Crawl for AI training datasets. Safe to block if you only want search citation.
+
+---
+
+## Optimization Priority by Platform
+
+If you can't optimize for everything, prioritize by your audience:
+
+| Priority | If Your Audience Uses | Focus On |
+|----------|----------------------|----------|
+| 1 | Google (everyone) | AI Overviews: Schema, E-E-A-T, topical authority |
+| 2 | ChatGPT (tech, business) | Domain authority, freshness, content-answer fit |
+| 3 | Perplexity (researchers, early adopters) | FAQ Schema, semantic relevance, PDFs |
+| 4 | Copilot (enterprise, Microsoft shops) | Bing indexing, LinkedIn presence |
+| 5 | Claude (developers, analysts) | Brave indexing, factual density |
+
+### Universal Actions (Do These First)
+
+1. Allow all AI bots in robots.txt
+2. Implement Schema markup (FAQPage, Article, Organization)
+3. Include statistics with citations in content
+4. Update content regularly (within 30 days for competitive topics)
+5. Use clear heading structure (H1 > H2 > H3)
+6. Ensure page speed < 2 seconds
+7. Add author bios with credentials