feat: add GEO research data and platform-specific ranking factors

Add Princeton GEO study (KDD 2024) 9-method table with exact visibility
percentages to SKILL.md. Add AI bot robots.txt configuration. Add keyword
stuffing warning (-10% visibility). Add platform-ranking-factors.md reference
with per-platform details: Google AI Overviews (5-stage pipeline), ChatGPT
(content-answer fit 55%, 30-day freshness 3.2x), Perplexity (3-layer RAG,
FAQ Schema priority), Copilot (Bing index + MS ecosystem), Claude (Brave
Search, 38K:1 crawl-to-refer ratio).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Corey Haines 2026-02-18 17:14:33 -08:00
parent ce8dd3b41a
commit 5436b34b98
2 changed files with 308 additions and 4 deletions

View file

@ -49,7 +49,9 @@ Gather this context (ask if not provided):
| **Perplexity** | Always cites sources with links | Favors authoritative, recent, well-structured content |
| **Gemini** | Google's AI assistant | Pulls from Google index + Knowledge Graph |
| **Copilot** | Bing-powered AI search | Bing index + authoritative sources |
| **Claude** | No live search (web search when enabled) | Training data + search results when available |
| **Claude** | Brave Search (when enabled) | Training data + Brave search results |
For detailed ranking factors per platform, see [references/platform-ranking-factors.md](references/platform-ranking-factors.md).
### Key Difference from Traditional SEO
@ -110,6 +112,25 @@ For each priority page, verify:
| Expert attribution (author name, credentials)? | |
| Recently updated (within 6 months)? | |
| Heading structure matches query patterns? | |
| AI bots allowed in robots.txt? | |
### Step 4: AI Bot Access Check
Verify your robots.txt allows AI crawlers. If these bots are blocked, AI platforms can't cite you:
```
# AI bots to allow in robots.txt
User-agent: GPTBot # OpenAI (ChatGPT)
User-agent: ChatGPT-User # ChatGPT with browsing
User-agent: PerplexityBot # Perplexity
User-agent: ClaudeBot # Anthropic (Claude)
User-agent: anthropic-ai # Anthropic (Claude)
User-agent: Google-Extended # Google AI (Gemini, AI Overviews)
User-agent: Bingbot # Microsoft Copilot
Allow: /
```
**Note:** Some companies block AI bots to prevent training on their content. That's a valid business decision — but if you block them, you won't get cited. You can selectively block training-only bots while allowing search bots.
---
@ -147,15 +168,31 @@ For detailed templates for each block type, see [references/content-patterns.md]
### Pillar 2: Authority — Make Content Citable
AI systems prefer sources they can trust. Build citation-worthiness:
AI systems prefer sources they can trust. Build citation-worthiness.
**Statistics and data** (40%+ citation boost)
**The Princeton GEO research** (KDD 2024, studied across Perplexity.ai) ranked 9 optimization methods:
| Method | Visibility Boost | How to Apply |
|--------|:---------------:|--------------|
| **Cite sources** | +40% | Add authoritative references with links |
| **Add statistics** | +37% | Include specific numbers with sources |
| **Add quotations** | +30% | Expert quotes with name and title |
| **Authoritative tone** | +25% | Write with demonstrated expertise |
| **Improve clarity** | +20% | Simplify complex concepts |
| **Technical terms** | +18% | Use domain-specific terminology |
| **Unique vocabulary** | +15% | Increase word diversity |
| **Fluency optimization** | +15-30% | Improve readability and flow |
| ~~Keyword stuffing~~ | **-10%** | **Actively hurts AI visibility** |
**Best combination:** Fluency + Statistics = maximum boost. Low-ranking sites benefit even more — up to 115% visibility increase with citations.
**Statistics and data** (+37-40% citation boost)
- Include specific numbers with sources
- Cite original research, not summaries of research
- Add dates to all statistics
- Original data beats aggregated data
**Expert attribution** (25-30% citation boost)
**Expert attribution** (+25-30% citation boost)
- Named authors with credentials
- Expert quotes with titles and organizations
- "According to [Source]" framing for claims
@ -324,6 +361,8 @@ Monthly manual check:
- **Gating all content** — AI can't access gated content. Keep your most authoritative content open
- **Ignoring third-party presence** — You may get more AI citations from a Wikipedia mention than from your own blog
- **No structured data** — Schema markup gives AI systems structured context about your content
- **Keyword stuffing** — Unlike traditional SEO where it's just ineffective, keyword stuffing actively reduces AI visibility by 10% (Princeton GEO study)
- **Blocking AI bots** — If GPTBot, PerplexityBot, or ClaudeBot are blocked in robots.txt, those platforms can't cite you
- **Generic content without data** — "We're the best" won't get cited. "Our customers see 3x improvement in [metric]" will
- **Forgetting to monitor** — You can't improve what you don't measure. Check AI visibility monthly at minimum

View file

@ -0,0 +1,265 @@
# Platform-Specific Ranking Factors
How each AI search platform selects sources and what to optimize for each.
Sources: Princeton GEO study (KDD 2024), SE Ranking study (129K domains), Ziptie content-answer fit analysis (400K pages).
---
## Quick Reference
| Platform | Primary Index | Key Factor | Unique Requirement |
|----------|--------------|------------|-------------------|
| **Google AI Overviews** | Google | E-E-A-T + structured data | Knowledge Graph presence |
| **ChatGPT** | Bing-based web | Domain authority + freshness | Content-answer fit |
| **Perplexity** | Own + Google | Semantic relevance | FAQ Schema, PDF hosting |
| **Copilot** | Bing | Bing indexing | Microsoft ecosystem presence |
| **Claude** | Brave Search | Factual density | Brave Search indexing |
| **Gemini** | Google | Google index + Knowledge Graph | Structured data |
---
## Google AI Overviews
Google's AI Overviews synthesize answers from multiple sources using a 5-stage pipeline.
### How Source Selection Works
1. **Retrieval** — Identify candidate sources from Google index
2. **Semantic ranking** — Evaluate topical relevance
3. **LLM re-ranking** — Assess contextual fit using Gemini
4. **E-E-A-T evaluation** — Filter for expertise, authority, trust
5. **Data fusion** — Synthesize from multiple sources with citations
### Key Stats
| Signal | Impact |
|--------|--------|
| Authoritative citations in content | +132% visibility |
| Authoritative tone | +89% visibility |
| Structured data (Schema) | +30-40% visibility |
| Overlap with traditional Top 10 | Only 15% (AI Overviews cite different pages) |
### What to Optimize
- Implement comprehensive Schema markup (Article, FAQPage, HowTo, Product)
- Build topical authority with content clusters and internal linking
- Include authoritative citations and references in content
- Add E-E-A-T signals (author bios, credentials, experience)
- Target informational "how-to" and "what is" queries
- Ensure content is in Google's Knowledge Graph (Wikipedia helps)
---
## ChatGPT (with Search)
ChatGPT uses a Bing-based web index for real-time search, combined with its training data.
### How Source Selection Works
Two-phase system:
1. **Pre-training knowledge** — Built from training data (Wikipedia, books, web)
2. **Real-time retrieval** — Web browsing for current information
### Ranking Factor Weights (SE Ranking Study, 129K Domains)
| Factor | Weight |
|--------|--------|
| Authority & credibility | ~40% |
| Content quality & utility | ~35% |
| Platform trust | ~25% |
### Content-Answer Fit Analysis (400K Pages Study)
| Factor | Relevance |
|--------|-----------|
| **Content-answer fit** | 55% — most important; match ChatGPT's response style |
| **On-page structure** | 14% — clear headings, formatting |
| **Domain authority** | 12% — helps retrieval, not citation |
| **Query relevance** | 12% — match user intent |
| **Content consensus** | 7% — agreement among sources |
### Key Stats
| Metric | Impact |
|--------|--------|
| >350K referring domains | 8.4 average citations |
| Domain trust score 97-100 | 8.4 citations (vs 6 for 91-96) |
| Content updated within 30 days | 3.2x more citations |
| Branded vs third-party domains | Branded cited 11.1 points more |
### Top Citation Sources
1. Wikipedia (7.8%)
2. Reddit (1.8%)
3. Forbes (1.1%)
4. Brand official sites (variable)
5. Academic sources (variable)
### What to Optimize
- Build a strong backlink profile (quality over quantity, >350K referring domains is elite)
- Update content frequently (within 30 days for competitive topics)
- Match ChatGPT's conversational answer style in your content
- Include verifiable statistics with citations
- Use clear H1/H2/H3 heading structure
- Build high domain trust score
---
## Perplexity AI
Perplexity always cites its sources with links. It uses Retrieval-Augmented Generation (RAG) with a 3-layer reranking system.
### How Source Selection Works
1. **Layer 1 (L1)** — Basic relevance retrieval
2. **Layer 2 (L2)** — Traditional ranking factors scoring
3. **Layer 3 (L3)** — ML models for quality evaluation (can discard entire result sets)
### Key Ranking Signals
| Signal | Details |
|--------|---------|
| Authoritative domain lists | Manual lists: Amazon, GitHub, academic sites get inherent boost |
| Freshness | Time decay algorithm; new content evaluated quickly |
| Semantic relevance | Content similarity to query (not keyword matching) |
| Topical weighting | Tech, AI, Science topics get visibility multipliers |
| Early engagement | First clicks on new posts significantly boost visibility |
### Unique to Perplexity
- **FAQ Schema (JSON-LD)** — Pages with FAQ blocks are cited more often
- **PDF documents** — Publicly hosted PDFs are prioritized for citation
- **Content velocity** — Speed of publishing matters more than keyword density
- **Semantic payloads** — Clear, atomic paragraphs preferred (self-contained)
### What to Optimize
- Allow PerplexityBot in robots.txt
- Implement FAQPage Schema markup
- Create publicly accessible PDF resources (whitepapers, guides)
- Use Article schema with timestamps
- Focus on semantic relevance over keywords
- Build topical authority in your niche
- Write clear, self-contained paragraphs
---
## Microsoft Copilot
Copilot is integrated into Edge, Windows, Microsoft 365, and Bing Search. It uses the **Bing Index** as its primary data source.
### Key Ranking Signals
| Signal | Details |
|--------|---------|
| Bing indexing | Must be indexed by Bing (required baseline) |
| Microsoft ecosystem | LinkedIn, GitHub mentions provide a boost |
| Page speed | < 2 seconds load time |
| Schema markup | Helps Copilot understand content context |
| Entity clarity | Clear definitions of entities and concepts |
### What to Optimize
- Submit site to Bing Webmaster Tools
- Use IndexNow for faster indexing of new content
- Optimize page speed (< 2 seconds)
- Write clear entity definitions in content
- Build presence on LinkedIn and GitHub
- Ensure Bingbot can crawl all important pages
---
## Claude AI
Claude uses **Brave Search** (not Google or Bing) when web search is enabled.
### Key Characteristics
| Signal | Details |
|--------|---------|
| Brave Index | Must be indexed by Brave Search |
| Factual density | Data-rich content strongly preferred |
| Structural clarity | Easy to extract information |
| Source authority | Trustworthy, well-sourced content |
| Selectivity | Crawl-to-refer ratio of 38,065:1 (extremely selective) |
Claude consumes vast amounts of content but cites very selectively. Quality and relevance are critical.
### What to Optimize
- Ensure Brave Search can find your content
- Allow ClaudeBot and anthropic-ai in robots.txt
- Create high factual density content (specific numbers, sources)
- Use clear, extractable structure
- Cite authoritative sources
- Focus on being the most factually accurate source for your topic
---
## robots.txt Configuration
Allow all major AI bots:
```
# Search engine bots
User-agent: Googlebot
Allow: /
User-agent: Bingbot
Allow: /
# AI search bots
User-agent: GPTBot
Allow: /
User-agent: ChatGPT-User
Allow: /
User-agent: PerplexityBot
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: anthropic-ai
Allow: /
User-agent: Google-Extended
Allow: /
# Sitemap
Sitemap: https://example.com/sitemap.xml
```
### Selective Blocking
If you want to allow AI search citation but block AI training:
- **GPTBot** — Used by OpenAI for both search and training. Blocking prevents ChatGPT citation.
- **Google-Extended** — Controls Gemini/AI Overviews usage. Blocking this doesn't affect regular Google Search.
- **CCBot** — Used by Common Crawl for AI training datasets. Safe to block if you only want search citation.
---
## Optimization Priority by Platform
If you can't optimize for everything, prioritize by your audience:
| Priority | If Your Audience Uses | Focus On |
|----------|----------------------|----------|
| 1 | Google (everyone) | AI Overviews: Schema, E-E-A-T, topical authority |
| 2 | ChatGPT (tech, business) | Domain authority, freshness, content-answer fit |
| 3 | Perplexity (researchers, early adopters) | FAQ Schema, semantic relevance, PDFs |
| 4 | Copilot (enterprise, Microsoft shops) | Bing indexing, LinkedIn presence |
| 5 | Claude (developers, analysts) | Brave indexing, factual density |
### Universal Actions (Do These First)
1. Allow all AI bots in robots.txt
2. Implement Schema markup (FAQPage, Article, Organization)
3. Include statistics with citations in content
4. Update content regularly (within 30 days for competitive topics)
5. Use clear heading structure (H1 > H2 > H3)
6. Ensure page speed < 2 seconds
7. Add author bios with credentials