68 Million AI Crawler Visits Reveal What Drives AI Search Visibility — Plus the Ghost Citation Problem
A study of 68.9 million AI crawler visits across 858,457 sites shows OpenAI controls 81% of AI crawl traffic. Separate research reveals 62% of AI citations are ghost citations where brands get a link but zero name recognition.
A study published by Search Engine Journal on April 20, 2026, analyzed 68.9 million AI crawler visits across 858,457 websites during February 2026 — the most granular public look at AI crawler behavior yet. Separately, Kevin Indig's research across 3,981 domains reveals that 62% of all AI citations are ghost citations where the brand gets a link but zero name recognition in the answer text.
Together, these studies reshape what AI SEO actually means in practice: it's no longer about whether AI crawlers find you, but whether they credit you when they use your content.
1. The AI Crawler Landscape: 68.9 Million Visits in One Month
The most significant finding is the shift in why AI bots crawl. The majority of AI crawler traffic is no longer about building training datasets. Instead, 56.9% of all AI crawler activity (39.8 million visits) is classified as User Fetch — real-time content retrieval triggered by a live user query in ChatGPT, Perplexity, or similar AI search interfaces.
| Crawl Purpose | Share | Volume | Primary Use |
|---|---|---|---|
| User Fetch | 56.9% | 39.8M | Real-time answers to live queries |
| Training | 28.8% | ~19.8M | Model learning via GPTBot and others |
| Discovery | 14.3% | ~9.9M | Content indexing across multiple systems |
This aligns with trends covered in our analysis of AI crawler visit patterns and Stanford's adoption research — the shift from training crawlers to real-time retrieval bots is accelerating.
2. Who Is Crawling: OpenAI Owns 81% of AI Bot Traffic
The concentration of AI crawler traffic is extreme. OpenAI accounts for 81.0% of all AI crawler visits (55.8 million out of 68.9 million), making it the dominant force in AI web crawling by an enormous margin.
| Company | Visits | Market Share |
|---|---|---|
| OpenAI | 55.8 million | 81.0% |
| Anthropic (Claude) | 11.5 million | 16.6% |
| Perplexity | 1.3 million | 1.8% |
| Google (Gemini) | 380,000 | 0.6% |
Google's low crawler volume is notable. At just 380,000 visits (0.6%), Gemini's crawling footprint is 147 times smaller than OpenAI's. This likely reflects Google's ability to leverage its existing Googlebot index rather than deploying separate AI-specific crawlers.
Year-Over-Year LLM Referral Traffic Growth
Referral traffic from LLM-powered search is growing rapidly, with some platforms showing explosive growth:
| Platform | Previous Period | Current Period | Growth |
|---|---|---|---|
| Total LLM Referrals | 93,484 | 161,469 | +72.7% |
| ChatGPT | 81,652 | 136,095 | +66.7% |
| Claude | 106 | 2,488 | +2,247% (23x) |
| Copilot | 22 | 9,560 | From near-zero |
| Perplexity | 11,533 | 13,157 | +14.1% |
3. What Drives AI Crawl Rates: Integrations, Schema, and Content Depth
The study isolates three categories of signals that predict higher AI crawl rates. Each contributes independently, and the effect compounds when combined.
Third-Party Integrations
| Integration | Crawl Rate (With) | Crawl Rate (Without) | Difference |
|---|---|---|---|
| Yext | 97.1% | ~58% | +38.9pp |
| Reviews Integration | 89.8% | 58.8% | +31.0pp |
Sites with Yext integration achieved a 97.1% crawl rate, meaning nearly every site was visited by at least one AI crawler. The likely mechanism: Yext syndication distributes business data across the web, creating more reference points for AI systems to discover and validate.
Structured Data and Business Profile Signals
| Feature | Crawl Rate (With) | Crawl Rate (Without) | Lift |
|---|---|---|---|
| Google Business Profile Sync | 92.8% | 58.9% | +33.9pp |
| Local Schema Markup | 72.3% | 55.2% | +17.1pp |
| Dynamic Pages | 69.4% | 58.2% | +11.2pp |
| Ecommerce | 54.2% | 59.2% | -5.0pp |
The granularity of structured data matters. Sites with no schema fields completed had a 55.2% crawl rate. Sites with 10–11 fields completed reached 82% — a 26.8 percentage point improvement. Each additional completed schema field adds roughly 2.7 percentage points of crawl probability. This reinforces findings from our Cloudflare Agent Readiness Score analysis on structured data's role in AI visibility.
Content Depth: The 33x Multiplier
Content volume is the single strongest predictor of AI crawler visit frequency:
This 33x difference is the largest effect size in the entire study, reinforcing that AI systems disproportionately target content-rich sites for real-time retrieval.
4. Business Impact: Crawled Sites Get 3.2x More Traffic
The study goes beyond crawl rates to measure business outcomes. Sites that received AI crawler visits consistently outperformed uncrawled sites:
| Metric | AI-Crawled Sites | Not Crawled | Multiplier |
|---|---|---|---|
| Avg. Human Sessions | 527.7 | 164.9 | 3.2x |
| Avg. Form Completions | 4.17 | 1.57 | 2.7x |
| Avg. Click-to-Call | 8.62 | 3.46 | 2.5x |
5. The Ghost Citation Problem: 62% of AI Citations Never Name You
Even if you win AI crawler attention and earn a citation in AI-generated answers, a separate problem looms: the AI probably won't mention your brand by name. Research from Kevin Indig, published in Growth Memo on April 21, 2026, quantifies what he calls the ghost citation problem.
The study tested four AI search engines — ChatGPT, Google AI Overviews, Gemini, and Google AI Mode — and found that 62% of all citations are ghost citations. A ghost citation occurs when the AI includes a source link but never mentions the brand name in the answer text.
| Citation Behavior | % of Domains |
|---|---|
| Cited by AI (link provided) | 74.9% |
| Mentioned by name in answer | 38.3% |
| Both cited AND mentioned | 13.2% |
| Ghost citations (cited, never named) | 61.7% |
The mechanism is structural, not random. Informational content (articles, guides, how-to pages) is the most vulnerable to ghost citation because the AI extracts facts without needing to endorse the source. Comparative and evaluative content ("best X for Y", product reviews, tool comparisons) generates brand mentions because the AI must name the entities being compared. This connects directly to the ChatGPT citation mechanics study showing only 1.93% of Reddit pages get cited despite heavy retrieval.
6. Platform Comparison: How Each AI Engine Handles Citations
Each AI search engine has a distinct citation personality, and understanding these differences is critical for prioritizing your GEO strategy.
| AI Engine | Citation Link Rate | Brand Mention Rate | Behavior |
|---|---|---|---|
| ChatGPT | 87.0% | 20.7% | High cite, low mention |
| Gemini | 21.4% | 83.7% | Low cite, high mention |
| Google AI Mode | Moderate | ~37.7% | Balanced |
| Google AI Overviews | Moderate-high | Moderate | Citation-leaning |
ChatGPT and Gemini are near-opposites. ChatGPT cites sources 87% of the time but only names brands 20.7% of the time — it gives you the link but rarely the brand visibility. Gemini does the reverse: it mentions brand names 83.7% of the time but only provides a clickable citation link 21.4% of the time.
Geographic Variation in Brand Mentions
Brand mention rates vary significantly by country, which matters for international SEO strategy:
The cross-engine disagreement rate is also notable: 22% of 454 prompt-domain combinations produced different mention outcomes across engines, meaning the same brand is named by one AI and ghosted by another for the same query.
7. Action Plan: Optimizing for Both AI Crawling and AI Citations
Combining findings from both studies, here is a concrete framework for improving both AI crawler visibility and brand citation quality.
For AI Crawl Visibility
For Brand Citation Quality
Frequently Asked Questions
What percentage of websites receive AI crawler visits?
According to an analysis of 858,457 websites in February 2026, 59% of sites received at least one AI crawler visit. Sites with over 10,000 human sessions had a 90.5% AI crawl rate, indicating that existing organic traffic strongly predicts AI crawler attention.
Which company sends the most AI crawlers?
OpenAI dominates AI crawling with 55.8 million visits out of 68.9 million total, representing 81.0% of all AI crawler traffic. Anthropic (Claude) is second at 16.6%, followed by Perplexity at 1.8% and Google Gemini at just 0.6%.
What is a ghost citation in AI search?
A ghost citation occurs when an AI search engine uses your content and includes a citation link to your site but never mentions your brand name in the answer text. Research across 3,981 domains found that 62% of all AI citations are ghost citations.
How does blog content volume affect AI crawler visits?
Sites with 50+ blog posts received an average of 1,373.7 AI crawler visits versus 41.6 for sites with no blog content — a 33x difference and the largest effect in the study.
Which AI search engine is best at mentioning brand names?
Gemini leads with an 83.7% brand mention rate but only generates citation links 21.4% of the time. ChatGPT does the opposite: it cites sources 87.0% of the time but only mentions brand names 20.7% of the time.
Does structured data help with AI crawler visibility?
Yes. Google Business Profile sync raised crawl rates from 58.9% to 92.8%. Local schema markup improved rates from 55.2% to 72.3%. Completing 10–11 schema fields reached 82% crawl rates. Third-party integrations like Yext achieved 97.1%.
