68 Million AI Crawler Visits Reveal What Drives AI Search Visibility — Plus the Ghost Citation Problem
A study of 68.9 million AI crawler visits across 858,457 sites shows OpenAI controls 81% of AI crawl traffic. Separate research reveals 62% of AI citations are ghost citations where brands get a link but zero name recognition.
Updated April 22, 2026Francisco Leon de Vivero
A study published by Search Engine Journal on April 20, 2026, analyzed 68.9 million AI crawler visits across 858,457 websites during February 2026 — the most granular public look at AI crawler behavior yet. Separately, Kevin Indig's research across 3,981 domains reveals that 62% of all AI citations are ghost citations where the brand gets a link but zero name recognition in the answer text.
Together, these studies reshape what AI SEO actually means in practice: it's no longer about whether AI crawlers find you, but whether they credit you when they use your content.
68.9M
AI crawler visits analyzed
858,457
Sites in the dataset
81%
OpenAI's share of AI crawl traffic
62%
AI citations that are ghost citations
1. The AI Crawler Scene: 68.9 Million Visits in
One Month The most significant finding is the shift in *why* AI bots crawl. The majority of AI crawler traffic is no longer about building training datasets. Instead,
56.9% of all AI crawler activity (39.8 million visits) is classified as User Fetch — real-time content retrieval triggered by a live user query in ChatGPT, Perplexity, or similar AI search interfaces.
Why this matters: AI crawlers are now primarily acting as intermediaries between your content and users asking questions right now. If your site blocks or throttles these bots, you're not just preventing training , you're preventing your content from appearing in real-time AI answers.
Crawl Purpose
Share
Volume
Primary Use
User Fetch
56.9%
39.8M
Real-time answers to live queries
Training
28.8%
~19.8M
Model learning via GPTBot and others
Discovery
14.3%
~9.9M
Content indexing across multiple systems
This aligns with trends covered in our analysis of AI crawler visit patterns and Stanford's adoption research , the shift from training crawlers to real-time retrieval bots is accelerating. ## 2. Who Is Crawling: OpenAI Owns 81% of AI Bot Traffic The concentration of AI crawler traffic is extreme. OpenAI accounts for 81.0% of all AI crawler visits (55.8 million out of 68.9 million), making it the dominant force in AI web crawling by an enormous margin.
Company
Visits
Market Share
OpenAI
55.8 million
81.0%
Anthropic (Claude)
11.5 million
16.6%
Perplexity
1.3 million
1.8%
Google (Gemini)
380,000
0.6%
Google's low crawler volume is notable. At just 380,000 visits (0.6%), Gemini's crawling footprint is 147 times smaller than OpenAI's. This likely reflects Google's ability to use its existing Googlebot index rather than deploying separate AI-specific crawlers.
Year-Over-Year LLM Referral Traffic Growth
Referral traffic from LLM-powered search is growing rapidly, with some platforms showing explosive growth:
Platform
Previous Period
Current Period
Growth
Total LLM Referrals
93,484
161,469
+72.7%
ChatGPT
81,652
136,095
+66.7%
Claude
106
2,488
+2,247% (23x)
Copilot
22
9,560
From near-zero
Perplexity
11,533
13,157
+14.1%
Claude referral traffic grew 23x year-over-year. While ChatGPT still dominates total referral volume (136,095 visits), Claude's jump from 106 to 2,488 referrals and Copilot's surge from 22 to 9,560 show that the LLM referral channel is diversifying rapidly.
3. What Drives AI Crawl Rates: Integrations, Schema, and
Content Depth The study isolates three categories of signals that predict higher AI crawl rates. Each contributes independently, and the effect compounds when combined.
Third-Party Integrations
Integration
Crawl Rate (With)
Crawl Rate (Without)
Difference
Yext
97.1%
~58%
+38.9pp
Reviews Integration
89.8%
58.8%
+31.0pp
Sites with Yext integration achieved a 97.1% crawl rate, meaning nearly every site was visited by at least one AI crawler. The likely mechanism: Yext syndication distributes business data across the web, creating more reference points for AI systems to discover and validate.
Structured Data and Business Profile Signals
Feature
Crawl Rate (With)
Crawl Rate (Without)
Lift
Google Business Profile Sync
92.8%
58.9%
+33.9pp
Local Schema Markup
72.3%
55.2%
+17.1pp
Active Pages
69.4%
58.2%
+11.2pp
Ecommerce
54.2%
59.2%
-5.0pp
Ecommerce sites show a negative crawl correlation (-5.0pp). This may reflect that many e-commerce product pages lack the informational content depth that AI crawlers prioritize. Product catalogs with thin descriptions get deprioritized relative to content-rich informational sites.
The granularity of structured data matters. Sites with no schema fields completed had a 55.2% crawl rate. Sites with 10–11 fields completed reached 82% , a 26.8 percentage point improvement. Each additional completed schema field adds roughly 2.7 percentage points of crawl probability. This reinforces findings from our Cloudflare Agent Readiness Score analysis on structured data's role in AI visibility.
Content Depth:
The 33x Multiplier Content volume is the single strongest predictor of AI crawler visit frequency:
1,373.7
Avg. AI visits , sites with 50+ blog posts
41.6
Avg. AI visits , sites with no blog content
33x
Difference in crawler visits
This 33x difference is the largest effect size in the entire study, reinforcing that AI systems disproportionately target content-rich sites for real-time retrieval. ## 4. Business Impact: Crawled Sites Get 3.2x More Traffic The study goes beyond crawl rates to measure business outcomes. Sites that received AI crawler visits consistently outperformed uncrawled sites:
Metric
AI-Crawled Sites
Not Crawled
Multiplier
Avg. Human Sessions
527.7
164.9
3.2x
Avg. Form Completions
4.17
1.57
2.7x
Avg. Click-to-Call
8.62
3.46
2.5x
Correlation vs. causation caveat: Sites that attract AI crawlers tend to be better-optimized overall, so these multipliers reflect a correlation between AI crawl activity and general site quality. However, the 90.5% crawl rate for sites with 10K+ sessions suggests that AI crawlers are drawn to sites that already have strong organic performance.
5. The Ghost Citation Problem: 62% of AI Citations Never Name You
Even if you win AI crawler attention and earn a citation in AI-generated answers, a separate problem looms: the AI probably won't mention your brand by name. Research from Kevin Indig, published in Growth Memo on April 21, 2026, quantifies what he calls the ghost citation problem.
3,981
Domains analyzed
115
Prompts tested
14
Countries
4
AI search engines tested
The study tested four AI search engines , ChatGPT, Google AI Overviews, Gemini, and Google AI Mode , and found that 62% of all citations are ghost citations. A ghost citation occurs when the AI includes a source link but never mentions the brand name in the answer text.
Citation Behavior
% of Domains
Cited by AI (link provided)
74.9%
Mentioned by name in answer
38.3%
Both cited AND mentioned
13.2%
Ghost citations (cited, never named)
61.7%
The brand visibility drop is severe: When AI cites your content without mentioning your brand, the effective citation rate drops from 53.1% to just 10.6%. You supply the facts, but the AI takes the credit.
The mechanism is structural, not random. Informational content (articles, guides, how-to pages) is the most vulnerable to ghost citation because the AI extracts facts without needing to endorse the source. Comparative and evaluative content ("best X for Y", product reviews, tool comparisons) generates brand mentions because the AI must name the entities being compared. This connects directly to the ChatGPT citation mechanics study showing only 1.93% of Reddit pages get cited despite heavy retrieval.
6. Platform Comparison: How Each AI Engine Handles Citations
Each AI search engine has a distinct citation personality, and understanding these differences is critical for prioritizing your GEO strategy.
AI Engine
Citation Link Rate
Brand Mention Rate
Behavior
ChatGPT
87.0%
20.7%
High cite, low mention
Gemini
21.4%
83.7%
Low cite, high mention
Google AI Mode
Moderate
~37.7%
Balanced
Google AI Overviews
Moderate-high
Moderate
Citation-leaning
ChatGPT and Gemini are near-opposites. ChatGPT cites sources 87% of the time but only names brands 20.7% of the time , it gives you the link but rarely the brand visibility. Gemini does the reverse: it mentions brand names 83.7% of the time but only provides a clickable citation link 21.4% of the time.
Geographic Variation in Brand Mentions Brand mention rates vary significantly by country, which matters for
The cross-engine disagreement rate is also notable: 22% of 454 prompt-domain combinations produced different mention outcomes across engines, meaning the same brand is named by one AI and ghosted by another for the same query.
Real-world example: Medium.com received 16 AI citations but zero brand mentions. Wikipedia got 27 citations but only 2 mentions. Instagram was named by ChatGPT and Gemini but ghosted by Google's own AI products.
7. Action Plan: Optimizing for Both AI Crawling and AI Citations
Combining findings from both studies, here is a concrete plan for improving both AI crawler visibility and brand citation quality.
For AI Crawl Visibility
1. Prioritize content depth over content breadth. The 33x difference in crawler visits between sites with 50+ posts and zero posts makes content volume the highest-use action. Publish substantive, informational blog content consistently.
2. Complete your structured data. Each additional local schema field adds roughly 2.7 percentage points of crawl probability. Complete all available schema fields , don't stop at the minimum required for rich results. Sync your Google Business Profile if applicable (92.8% vs. 58.9% crawl rate).
3. Build external data connections. Third-party integrations like Yext (97.1% crawl rate) and review platforms (89.8%) create additional signals that AI systems use for entity validation and discovery.
4. Don't block User Fetch crawlers. With 56.9% of AI crawler activity being real-time content retrieval, blocking these bots means blocking your visibility in AI answers. Review your robots.txt and consider allowing ChatGPT-User and similar user-fetch agents even if you block training bots.
For Brand Citation Quality
5. Create comparative and evaluative content. Informational content gets ghost-cited. Content that compares, evaluates, or recommends specific entities forces the AI to name brands. Shift your content mix toward "best X for Y", expert reviews, and tool comparisons.
6. Embed your brand in factual claims. When AI extracts a fact, it rarely attributes the source. When AI cites an opinion, finding, or unique methodology, it often names the author. Tie your brand to original data, proprietary frameworks, and named methodologies.
7. Monitor ghost citations. Only 22% of marketing teams have infrastructure to track AI citations. Use tools that can detect when your domain appears in AI answers and whether your brand is mentioned. Track both citation rate and mention rate separately. Our AI SEO Audit covers this analysis in depth.
What percentage of websites receive AI crawler visits?
According to an analysis of 858,457 websites in February 2026, 59% of sites received at least one AI crawler visit. Sites with over 10,000 human sessions had a 90.5% AI crawl rate, indicating that existing organic traffic strongly predicts AI crawler attention.
Which company sends the most AI crawlers?
OpenAI dominates AI crawling with 55.8 million visits out of 68.9 million total, representing 81.0% of all AI crawler traffic. Anthropic (Claude) is second at 16.6%, followed by Perplexity at 1.8% and Google Gemini at just 0.6%.
What is a ghost citation in AI search?
A ghost citation occurs when an AI search engine uses your content and includes a citation link to your site but never mentions your brand name in the answer text. Research across 3,981 domains found that 62% of all AI citations are ghost citations.
How does blog content volume affect AI crawler visits?
Sites with 50+ blog posts received an average of 1,373.7 AI crawler visits versus 41.6 for sites with no blog content , a 33x difference and the largest effect in the study.
Which AI search engine is best at mentioning brand names?
Gemini leads with an 83.7% brand mention rate but only generates citation links 21.4% of the time. ChatGPT does the opposite: it cites sources 87.0% of the time but only mentions brand names 20.7% of the time.
Does structured data help with AI crawler visibility?
Yes. Google Business Profile sync raised crawl rates from 58.9% to 92.8%. Local schema markup improved rates from 55.2% to 72.3%. Completing 10–11 schema fields reached 82% crawl rates. Third-party integrations like Yext achieved 97.1%.
About the Author
Francisco Leon de Vivero is VP of Growth at Growing Search and a global SEO expert with 15+ years of experience across enterprise, ecommerce, and international search. He previously led Global SEO Plan at Shopify and has spoken at UnGagged, SEonthebeach, and other international conferences.
The goal is not audit sprawl. It is translating complex technical issues into prioritized actions that development and marketing teams can actually execute.