feat: add GEO research data and platform-specific ranking factors
Add Princeton GEO study (KDD 2024) 9-method table with exact visibility percentages to SKILL.md. Add AI bot robots.txt configuration. Add keyword stuffing warning (-10% visibility). Add platform-ranking-factors.md reference with per-platform details: Google AI Overviews (5-stage pipeline), ChatGPT (content-answer fit 55%, 30-day freshness 3.2x), Perplexity (3-layer RAG, FAQ Schema priority), Copilot (Bing index + MS ecosystem), Claude (Brave Search, 38K:1 crawl-to-refer ratio). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
parent
ce8dd3b41a
commit
5436b34b98
2 changed files with 308 additions and 4 deletions
|
|
@ -49,7 +49,9 @@ Gather this context (ask if not provided):
|
|||
| **Perplexity** | Always cites sources with links | Favors authoritative, recent, well-structured content |
|
||||
| **Gemini** | Google's AI assistant | Pulls from Google index + Knowledge Graph |
|
||||
| **Copilot** | Bing-powered AI search | Bing index + authoritative sources |
|
||||
| **Claude** | No live search (web search when enabled) | Training data + search results when available |
|
||||
| **Claude** | Brave Search (when enabled) | Training data + Brave search results |
|
||||
|
||||
For detailed ranking factors per platform, see [references/platform-ranking-factors.md](references/platform-ranking-factors.md).
|
||||
|
||||
### Key Difference from Traditional SEO
|
||||
|
||||
|
|
@ -110,6 +112,25 @@ For each priority page, verify:
|
|||
| Expert attribution (author name, credentials)? | |
|
||||
| Recently updated (within 6 months)? | |
|
||||
| Heading structure matches query patterns? | |
|
||||
| AI bots allowed in robots.txt? | |
|
||||
|
||||
### Step 4: AI Bot Access Check
|
||||
|
||||
Verify your robots.txt allows AI crawlers. If these bots are blocked, AI platforms can't cite you:
|
||||
|
||||
```
|
||||
# AI bots to allow in robots.txt
|
||||
User-agent: GPTBot # OpenAI (ChatGPT)
|
||||
User-agent: ChatGPT-User # ChatGPT with browsing
|
||||
User-agent: PerplexityBot # Perplexity
|
||||
User-agent: ClaudeBot # Anthropic (Claude)
|
||||
User-agent: anthropic-ai # Anthropic (Claude)
|
||||
User-agent: Google-Extended # Google AI (Gemini, AI Overviews)
|
||||
User-agent: Bingbot # Microsoft Copilot
|
||||
Allow: /
|
||||
```
|
||||
|
||||
**Note:** Some companies block AI bots to prevent training on their content. That's a valid business decision — but if you block them, you won't get cited. You can selectively block training-only bots while allowing search bots.
|
||||
|
||||
---
|
||||
|
||||
|
|
@ -147,15 +168,31 @@ For detailed templates for each block type, see [references/content-patterns.md]
|
|||
|
||||
### Pillar 2: Authority — Make Content Citable
|
||||
|
||||
AI systems prefer sources they can trust. Build citation-worthiness:
|
||||
AI systems prefer sources they can trust. Build citation-worthiness.
|
||||
|
||||
**Statistics and data** (40%+ citation boost)
|
||||
**The Princeton GEO research** (KDD 2024, studied across Perplexity.ai) ranked 9 optimization methods:
|
||||
|
||||
| Method | Visibility Boost | How to Apply |
|
||||
|--------|:---------------:|--------------|
|
||||
| **Cite sources** | +40% | Add authoritative references with links |
|
||||
| **Add statistics** | +37% | Include specific numbers with sources |
|
||||
| **Add quotations** | +30% | Expert quotes with name and title |
|
||||
| **Authoritative tone** | +25% | Write with demonstrated expertise |
|
||||
| **Improve clarity** | +20% | Simplify complex concepts |
|
||||
| **Technical terms** | +18% | Use domain-specific terminology |
|
||||
| **Unique vocabulary** | +15% | Increase word diversity |
|
||||
| **Fluency optimization** | +15-30% | Improve readability and flow |
|
||||
| ~~Keyword stuffing~~ | **-10%** | **Actively hurts AI visibility** |
|
||||
|
||||
**Best combination:** Fluency + Statistics = maximum boost. Low-ranking sites benefit even more — up to 115% visibility increase with citations.
|
||||
|
||||
**Statistics and data** (+37-40% citation boost)
|
||||
- Include specific numbers with sources
|
||||
- Cite original research, not summaries of research
|
||||
- Add dates to all statistics
|
||||
- Original data beats aggregated data
|
||||
|
||||
**Expert attribution** (25-30% citation boost)
|
||||
**Expert attribution** (+25-30% citation boost)
|
||||
- Named authors with credentials
|
||||
- Expert quotes with titles and organizations
|
||||
- "According to [Source]" framing for claims
|
||||
|
|
@ -324,6 +361,8 @@ Monthly manual check:
|
|||
- **Gating all content** — AI can't access gated content. Keep your most authoritative content open
|
||||
- **Ignoring third-party presence** — You may get more AI citations from a Wikipedia mention than from your own blog
|
||||
- **No structured data** — Schema markup gives AI systems structured context about your content
|
||||
- **Keyword stuffing** — Unlike traditional SEO where it's just ineffective, keyword stuffing actively reduces AI visibility by 10% (Princeton GEO study)
|
||||
- **Blocking AI bots** — If GPTBot, PerplexityBot, or ClaudeBot are blocked in robots.txt, those platforms can't cite you
|
||||
- **Generic content without data** — "We're the best" won't get cited. "Our customers see 3x improvement in [metric]" will
|
||||
- **Forgetting to monitor** — You can't improve what you don't measure. Check AI visibility monthly at minimum
|
||||
|
||||
|
|
|
|||
265
skills/ai-seo/references/platform-ranking-factors.md
Normal file
265
skills/ai-seo/references/platform-ranking-factors.md
Normal file
|
|
@ -0,0 +1,265 @@
|
|||
# Platform-Specific Ranking Factors
|
||||
|
||||
How each AI search platform selects sources and what to optimize for each.
|
||||
|
||||
Sources: Princeton GEO study (KDD 2024), SE Ranking study (129K domains), Ziptie content-answer fit analysis (400K pages).
|
||||
|
||||
---
|
||||
|
||||
## Quick Reference
|
||||
|
||||
| Platform | Primary Index | Key Factor | Unique Requirement |
|
||||
|----------|--------------|------------|-------------------|
|
||||
| **Google AI Overviews** | Google | E-E-A-T + structured data | Knowledge Graph presence |
|
||||
| **ChatGPT** | Bing-based web | Domain authority + freshness | Content-answer fit |
|
||||
| **Perplexity** | Own + Google | Semantic relevance | FAQ Schema, PDF hosting |
|
||||
| **Copilot** | Bing | Bing indexing | Microsoft ecosystem presence |
|
||||
| **Claude** | Brave Search | Factual density | Brave Search indexing |
|
||||
| **Gemini** | Google | Google index + Knowledge Graph | Structured data |
|
||||
|
||||
---
|
||||
|
||||
## Google AI Overviews
|
||||
|
||||
Google's AI Overviews synthesize answers from multiple sources using a 5-stage pipeline.
|
||||
|
||||
### How Source Selection Works
|
||||
|
||||
1. **Retrieval** — Identify candidate sources from Google index
|
||||
2. **Semantic ranking** — Evaluate topical relevance
|
||||
3. **LLM re-ranking** — Assess contextual fit using Gemini
|
||||
4. **E-E-A-T evaluation** — Filter for expertise, authority, trust
|
||||
5. **Data fusion** — Synthesize from multiple sources with citations
|
||||
|
||||
### Key Stats
|
||||
|
||||
| Signal | Impact |
|
||||
|--------|--------|
|
||||
| Authoritative citations in content | +132% visibility |
|
||||
| Authoritative tone | +89% visibility |
|
||||
| Structured data (Schema) | +30-40% visibility |
|
||||
| Overlap with traditional Top 10 | Only 15% (AI Overviews cite different pages) |
|
||||
|
||||
### What to Optimize
|
||||
|
||||
- Implement comprehensive Schema markup (Article, FAQPage, HowTo, Product)
|
||||
- Build topical authority with content clusters and internal linking
|
||||
- Include authoritative citations and references in content
|
||||
- Add E-E-A-T signals (author bios, credentials, experience)
|
||||
- Target informational "how-to" and "what is" queries
|
||||
- Ensure content is in Google's Knowledge Graph (Wikipedia helps)
|
||||
|
||||
---
|
||||
|
||||
## ChatGPT (with Search)
|
||||
|
||||
ChatGPT uses a Bing-based web index for real-time search, combined with its training data.
|
||||
|
||||
### How Source Selection Works
|
||||
|
||||
Two-phase system:
|
||||
1. **Pre-training knowledge** — Built from training data (Wikipedia, books, web)
|
||||
2. **Real-time retrieval** — Web browsing for current information
|
||||
|
||||
### Ranking Factor Weights (SE Ranking Study, 129K Domains)
|
||||
|
||||
| Factor | Weight |
|
||||
|--------|--------|
|
||||
| Authority & credibility | ~40% |
|
||||
| Content quality & utility | ~35% |
|
||||
| Platform trust | ~25% |
|
||||
|
||||
### Content-Answer Fit Analysis (400K Pages Study)
|
||||
|
||||
| Factor | Relevance |
|
||||
|--------|-----------|
|
||||
| **Content-answer fit** | 55% — most important; match ChatGPT's response style |
|
||||
| **On-page structure** | 14% — clear headings, formatting |
|
||||
| **Domain authority** | 12% — helps retrieval, not citation |
|
||||
| **Query relevance** | 12% — match user intent |
|
||||
| **Content consensus** | 7% — agreement among sources |
|
||||
|
||||
### Key Stats
|
||||
|
||||
| Metric | Impact |
|
||||
|--------|--------|
|
||||
| >350K referring domains | 8.4 average citations |
|
||||
| Domain trust score 97-100 | 8.4 citations (vs 6 for 91-96) |
|
||||
| Content updated within 30 days | 3.2x more citations |
|
||||
| Branded vs third-party domains | Branded cited 11.1 points more |
|
||||
|
||||
### Top Citation Sources
|
||||
|
||||
1. Wikipedia (7.8%)
|
||||
2. Reddit (1.8%)
|
||||
3. Forbes (1.1%)
|
||||
4. Brand official sites (variable)
|
||||
5. Academic sources (variable)
|
||||
|
||||
### What to Optimize
|
||||
|
||||
- Build a strong backlink profile (quality over quantity, >350K referring domains is elite)
|
||||
- Update content frequently (within 30 days for competitive topics)
|
||||
- Match ChatGPT's conversational answer style in your content
|
||||
- Include verifiable statistics with citations
|
||||
- Use clear H1/H2/H3 heading structure
|
||||
- Build high domain trust score
|
||||
|
||||
---
|
||||
|
||||
## Perplexity AI
|
||||
|
||||
Perplexity always cites its sources with links. It uses Retrieval-Augmented Generation (RAG) with a 3-layer reranking system.
|
||||
|
||||
### How Source Selection Works
|
||||
|
||||
1. **Layer 1 (L1)** — Basic relevance retrieval
|
||||
2. **Layer 2 (L2)** — Traditional ranking factors scoring
|
||||
3. **Layer 3 (L3)** — ML models for quality evaluation (can discard entire result sets)
|
||||
|
||||
### Key Ranking Signals
|
||||
|
||||
| Signal | Details |
|
||||
|--------|---------|
|
||||
| Authoritative domain lists | Manual lists: Amazon, GitHub, academic sites get inherent boost |
|
||||
| Freshness | Time decay algorithm; new content evaluated quickly |
|
||||
| Semantic relevance | Content similarity to query (not keyword matching) |
|
||||
| Topical weighting | Tech, AI, Science topics get visibility multipliers |
|
||||
| Early engagement | First clicks on new posts significantly boost visibility |
|
||||
|
||||
### Unique to Perplexity
|
||||
|
||||
- **FAQ Schema (JSON-LD)** — Pages with FAQ blocks are cited more often
|
||||
- **PDF documents** — Publicly hosted PDFs are prioritized for citation
|
||||
- **Content velocity** — Speed of publishing matters more than keyword density
|
||||
- **Semantic payloads** — Clear, atomic paragraphs preferred (self-contained)
|
||||
|
||||
### What to Optimize
|
||||
|
||||
- Allow PerplexityBot in robots.txt
|
||||
- Implement FAQPage Schema markup
|
||||
- Create publicly accessible PDF resources (whitepapers, guides)
|
||||
- Use Article schema with timestamps
|
||||
- Focus on semantic relevance over keywords
|
||||
- Build topical authority in your niche
|
||||
- Write clear, self-contained paragraphs
|
||||
|
||||
---
|
||||
|
||||
## Microsoft Copilot
|
||||
|
||||
Copilot is integrated into Edge, Windows, Microsoft 365, and Bing Search. It uses the **Bing Index** as its primary data source.
|
||||
|
||||
### Key Ranking Signals
|
||||
|
||||
| Signal | Details |
|
||||
|--------|---------|
|
||||
| Bing indexing | Must be indexed by Bing (required baseline) |
|
||||
| Microsoft ecosystem | LinkedIn, GitHub mentions provide a boost |
|
||||
| Page speed | < 2 seconds load time |
|
||||
| Schema markup | Helps Copilot understand content context |
|
||||
| Entity clarity | Clear definitions of entities and concepts |
|
||||
|
||||
### What to Optimize
|
||||
|
||||
- Submit site to Bing Webmaster Tools
|
||||
- Use IndexNow for faster indexing of new content
|
||||
- Optimize page speed (< 2 seconds)
|
||||
- Write clear entity definitions in content
|
||||
- Build presence on LinkedIn and GitHub
|
||||
- Ensure Bingbot can crawl all important pages
|
||||
|
||||
---
|
||||
|
||||
## Claude AI
|
||||
|
||||
Claude uses **Brave Search** (not Google or Bing) when web search is enabled.
|
||||
|
||||
### Key Characteristics
|
||||
|
||||
| Signal | Details |
|
||||
|--------|---------|
|
||||
| Brave Index | Must be indexed by Brave Search |
|
||||
| Factual density | Data-rich content strongly preferred |
|
||||
| Structural clarity | Easy to extract information |
|
||||
| Source authority | Trustworthy, well-sourced content |
|
||||
| Selectivity | Crawl-to-refer ratio of 38,065:1 (extremely selective) |
|
||||
|
||||
Claude consumes vast amounts of content but cites very selectively. Quality and relevance are critical.
|
||||
|
||||
### What to Optimize
|
||||
|
||||
- Ensure Brave Search can find your content
|
||||
- Allow ClaudeBot and anthropic-ai in robots.txt
|
||||
- Create high factual density content (specific numbers, sources)
|
||||
- Use clear, extractable structure
|
||||
- Cite authoritative sources
|
||||
- Focus on being the most factually accurate source for your topic
|
||||
|
||||
---
|
||||
|
||||
## robots.txt Configuration
|
||||
|
||||
Allow all major AI bots:
|
||||
|
||||
```
|
||||
# Search engine bots
|
||||
User-agent: Googlebot
|
||||
Allow: /
|
||||
|
||||
User-agent: Bingbot
|
||||
Allow: /
|
||||
|
||||
# AI search bots
|
||||
User-agent: GPTBot
|
||||
Allow: /
|
||||
|
||||
User-agent: ChatGPT-User
|
||||
Allow: /
|
||||
|
||||
User-agent: PerplexityBot
|
||||
Allow: /
|
||||
|
||||
User-agent: ClaudeBot
|
||||
Allow: /
|
||||
|
||||
User-agent: anthropic-ai
|
||||
Allow: /
|
||||
|
||||
User-agent: Google-Extended
|
||||
Allow: /
|
||||
|
||||
# Sitemap
|
||||
Sitemap: https://example.com/sitemap.xml
|
||||
```
|
||||
|
||||
### Selective Blocking
|
||||
|
||||
If you want to allow AI search citation but block AI training:
|
||||
- **GPTBot** — Used by OpenAI for both search and training. Blocking prevents ChatGPT citation.
|
||||
- **Google-Extended** — Controls Gemini/AI Overviews usage. Blocking this doesn't affect regular Google Search.
|
||||
- **CCBot** — Used by Common Crawl for AI training datasets. Safe to block if you only want search citation.
|
||||
|
||||
---
|
||||
|
||||
## Optimization Priority by Platform
|
||||
|
||||
If you can't optimize for everything, prioritize by your audience:
|
||||
|
||||
| Priority | If Your Audience Uses | Focus On |
|
||||
|----------|----------------------|----------|
|
||||
| 1 | Google (everyone) | AI Overviews: Schema, E-E-A-T, topical authority |
|
||||
| 2 | ChatGPT (tech, business) | Domain authority, freshness, content-answer fit |
|
||||
| 3 | Perplexity (researchers, early adopters) | FAQ Schema, semantic relevance, PDFs |
|
||||
| 4 | Copilot (enterprise, Microsoft shops) | Bing indexing, LinkedIn presence |
|
||||
| 5 | Claude (developers, analysts) | Brave indexing, factual density |
|
||||
|
||||
### Universal Actions (Do These First)
|
||||
|
||||
1. Allow all AI bots in robots.txt
|
||||
2. Implement Schema markup (FAQPage, Article, Organization)
|
||||
3. Include statistics with citations in content
|
||||
4. Update content regularly (within 30 days for competitive topics)
|
||||
5. Use clear heading structure (H1 > H2 > H3)
|
||||
6. Ensure page speed < 2 seconds
|
||||
7. Add author bios with credentials
|
||||
Loading…
Reference in a new issue