Corey Haines 5436b34b98 feat: add GEO research data and platform-specific ranking factors

Add Princeton GEO study (KDD 2024) 9-method table with exact visibility
percentages to SKILL.md. Add AI bot robots.txt configuration. Add keyword
stuffing warning (-10% visibility). Add platform-ranking-factors.md reference
with per-platform details: Google AI Overviews (5-stage pipeline), ChatGPT
(content-answer fit 55%, 30-day freshness 3.2x), Perplexity (3-layer RAG,
FAQ Schema priority), Copilot (Bing index + MS ecosystem), Claude (Brave
Search, 38K:1 crawl-to-refer ratio).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-02-18 17:14:33 -08:00

8.6 KiB

Raw Blame History

Platform-Specific Ranking Factors

How each AI search platform selects sources and what to optimize for each.

Sources: Princeton GEO study (KDD 2024), SE Ranking study (129K domains), Ziptie content-answer fit analysis (400K pages).

Quick Reference

Platform	Primary Index	Key Factor	Unique Requirement
Google AI Overviews	Google	E-E-A-T + structured data	Knowledge Graph presence
ChatGPT	Bing-based web	Domain authority + freshness	Content-answer fit
Perplexity	Own + Google	Semantic relevance	FAQ Schema, PDF hosting
Copilot	Bing	Bing indexing	Microsoft ecosystem presence
Claude	Brave Search	Factual density	Brave Search indexing
Gemini	Google	Google index + Knowledge Graph	Structured data

Google AI Overviews

Google's AI Overviews synthesize answers from multiple sources using a 5-stage pipeline.

How Source Selection Works

Retrieval — Identify candidate sources from Google index
Semantic ranking — Evaluate topical relevance
LLM re-ranking — Assess contextual fit using Gemini
E-E-A-T evaluation — Filter for expertise, authority, trust
Data fusion — Synthesize from multiple sources with citations

Key Stats

Signal	Impact
Authoritative citations in content	+132% visibility
Authoritative tone	+89% visibility
Structured data (Schema)	+30-40% visibility
Overlap with traditional Top 10	Only 15% (AI Overviews cite different pages)

What to Optimize

Implement comprehensive Schema markup (Article, FAQPage, HowTo, Product)
Build topical authority with content clusters and internal linking
Include authoritative citations and references in content
Add E-E-A-T signals (author bios, credentials, experience)
Target informational "how-to" and "what is" queries
Ensure content is in Google's Knowledge Graph (Wikipedia helps)

ChatGPT (with Search)

ChatGPT uses a Bing-based web index for real-time search, combined with its training data.

How Source Selection Works

Two-phase system:

Pre-training knowledge — Built from training data (Wikipedia, books, web)
Real-time retrieval — Web browsing for current information

Ranking Factor Weights (SE Ranking Study, 129K Domains)

Factor	Weight
Authority & credibility	~40%
Content quality & utility	~35%
Platform trust	~25%

Content-Answer Fit Analysis (400K Pages Study)

Factor	Relevance
Content-answer fit	55% — most important; match ChatGPT's response style
On-page structure	14% — clear headings, formatting
Domain authority	12% — helps retrieval, not citation
Query relevance	12% — match user intent
Content consensus	7% — agreement among sources

Key Stats

Metric	Impact
>350K referring domains	8.4 average citations
Domain trust score 97-100	8.4 citations (vs 6 for 91-96)
Content updated within 30 days	3.2x more citations
Branded vs third-party domains	Branded cited 11.1 points more

Top Citation Sources

Wikipedia (7.8%)
Reddit (1.8%)
Forbes (1.1%)
Brand official sites (variable)
Academic sources (variable)

What to Optimize

Build a strong backlink profile (quality over quantity, >350K referring domains is elite)
Update content frequently (within 30 days for competitive topics)
Match ChatGPT's conversational answer style in your content
Include verifiable statistics with citations
Use clear H1/H2/H3 heading structure
Build high domain trust score

Perplexity AI

Perplexity always cites its sources with links. It uses Retrieval-Augmented Generation (RAG) with a 3-layer reranking system.

How Source Selection Works

Layer 1 (L1) — Basic relevance retrieval
Layer 2 (L2) — Traditional ranking factors scoring
Layer 3 (L3) — ML models for quality evaluation (can discard entire result sets)

Key Ranking Signals

Signal	Details
Authoritative domain lists	Manual lists: Amazon, GitHub, academic sites get inherent boost
Freshness	Time decay algorithm; new content evaluated quickly
Semantic relevance	Content similarity to query (not keyword matching)
Topical weighting	Tech, AI, Science topics get visibility multipliers
Early engagement	First clicks on new posts significantly boost visibility

Unique to Perplexity

FAQ Schema (JSON-LD) — Pages with FAQ blocks are cited more often
PDF documents — Publicly hosted PDFs are prioritized for citation
Content velocity — Speed of publishing matters more than keyword density
Semantic payloads — Clear, atomic paragraphs preferred (self-contained)

What to Optimize

Allow PerplexityBot in robots.txt
Implement FAQPage Schema markup
Create publicly accessible PDF resources (whitepapers, guides)
Use Article schema with timestamps
Focus on semantic relevance over keywords
Build topical authority in your niche
Write clear, self-contained paragraphs

Microsoft Copilot

Copilot is integrated into Edge, Windows, Microsoft 365, and Bing Search. It uses the Bing Index as its primary data source.

Key Ranking Signals

Signal	Details
Bing indexing	Must be indexed by Bing (required baseline)
Microsoft ecosystem	LinkedIn, GitHub mentions provide a boost
Page speed	< 2 seconds load time
Schema markup	Helps Copilot understand content context
Entity clarity	Clear definitions of entities and concepts

What to Optimize

Submit site to Bing Webmaster Tools
Use IndexNow for faster indexing of new content
Optimize page speed (< 2 seconds)
Write clear entity definitions in content
Build presence on LinkedIn and GitHub
Ensure Bingbot can crawl all important pages

Claude AI

Claude uses Brave Search (not Google or Bing) when web search is enabled.

Key Characteristics

Signal	Details
Brave Index	Must be indexed by Brave Search
Factual density	Data-rich content strongly preferred
Structural clarity	Easy to extract information
Source authority	Trustworthy, well-sourced content
Selectivity	Crawl-to-refer ratio of 38,065:1 (extremely selective)

Claude consumes vast amounts of content but cites very selectively. Quality and relevance are critical.

What to Optimize

Ensure Brave Search can find your content
Allow ClaudeBot and anthropic-ai in robots.txt
Create high factual density content (specific numbers, sources)
Use clear, extractable structure
Cite authoritative sources
Focus on being the most factually accurate source for your topic

robots.txt Configuration

Allow all major AI bots:

# Search engine bots
User-agent: Googlebot
Allow: /

User-agent: Bingbot
Allow: /

# AI search bots
User-agent: GPTBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: anthropic-ai
Allow: /

User-agent: Google-Extended
Allow: /

# Sitemap
Sitemap: https://example.com/sitemap.xml

Selective Blocking

If you want to allow AI search citation but block AI training:

GPTBot — Used by OpenAI for both search and training. Blocking prevents ChatGPT citation.
Google-Extended — Controls Gemini/AI Overviews usage. Blocking this doesn't affect regular Google Search.
CCBot — Used by Common Crawl for AI training datasets. Safe to block if you only want search citation.

Optimization Priority by Platform

If you can't optimize for everything, prioritize by your audience:

Priority	If Your Audience Uses	Focus On
1	Google (everyone)	AI Overviews: Schema, E-E-A-T, topical authority
2	ChatGPT (tech, business)	Domain authority, freshness, content-answer fit
3	Perplexity (researchers, early adopters)	FAQ Schema, semantic relevance, PDFs
4	Copilot (enterprise, Microsoft shops)	Bing indexing, LinkedIn presence
5	Claude (developers, analysts)	Brave indexing, factual density

Universal Actions (Do These First)

Allow all AI bots in robots.txt
Implement Schema markup (FAQPage, Article, Organization)
Include statistics with citations in content
Update content regularly (within 30 days for competitive topics)
Use clear heading structure (H1 > H2 > H3)
Ensure page speed < 2 seconds
Add author bios with credentials

8.6 KiB Raw Blame History

Platform-Specific Ranking Factors

Quick Reference

Google AI Overviews

How Source Selection Works

Key Stats

What to Optimize

ChatGPT (with Search)

How Source Selection Works

Ranking Factor Weights (SE Ranking Study, 129K Domains)

Content-Answer Fit Analysis (400K Pages Study)

Key Stats

Top Citation Sources

What to Optimize

Perplexity AI

How Source Selection Works

Key Ranking Signals

Unique to Perplexity

What to Optimize

Microsoft Copilot

Key Ranking Signals

What to Optimize

Claude AI

Key Characteristics

What to Optimize

robots.txt Configuration

Selective Blocking

Optimization Priority by Platform

Universal Actions (Do These First)

8.6 KiB

Raw Blame History