News

68 Million AI Crawler Visits Reveal What Drives AI Search Visibility — Plus the Ghost Citation Problem

A study of 68.9 million AI crawler visits across 858,457 sites shows OpenAI controls 81% of AI crawl traffic. Separate research reveals 62% of AI citations are ghost citations where brands get a link but zero name recognition.

Updated April 22, 2026 Francisco Leon de Vivero
68 Million AI Crawler Visits Reveal What Drives AI Search Visibility — Plus the Ghost Citation Problem

A study published by Search Engine Journal on April 20, 2026, analyzed 68.9 million AI crawler visits across 858,457 websites during February 2026 — the most granular public look at AI crawler behavior yet. Separately, Kevin Indig's research across 3,981 domains reveals that 62% of all AI citations are ghost citations where the brand gets a link but zero name recognition in the answer text.

Together, these studies reshape what AI SEO actually means in practice: it's no longer about whether AI crawlers find you, but whether they credit you when they use your content.

68.9M
AI crawler visits analyzed
858,457
Sites in the dataset
81%
OpenAI's share of AI crawl traffic
62%
AI citations that are ghost citations

1. The AI Crawler Landscape: 68.9 Million Visits in One Month

The most significant finding is the shift in why AI bots crawl. The majority of AI crawler traffic is no longer about building training datasets. Instead, 56.9% of all AI crawler activity (39.8 million visits) is classified as User Fetch — real-time content retrieval triggered by a live user query in ChatGPT, Perplexity, or similar AI search interfaces.

Why this matters: AI crawlers are now primarily acting as intermediaries between your content and users asking questions right now. If your site blocks or throttles these bots, you're not just preventing training — you're preventing your content from appearing in real-time AI answers.
Crawl PurposeShareVolumePrimary Use
User Fetch56.9%39.8MReal-time answers to live queries
Training28.8%~19.8MModel learning via GPTBot and others
Discovery14.3%~9.9MContent indexing across multiple systems

This aligns with trends covered in our analysis of AI crawler visit patterns and Stanford's adoption research — the shift from training crawlers to real-time retrieval bots is accelerating.

2. Who Is Crawling: OpenAI Owns 81% of AI Bot Traffic

The concentration of AI crawler traffic is extreme. OpenAI accounts for 81.0% of all AI crawler visits (55.8 million out of 68.9 million), making it the dominant force in AI web crawling by an enormous margin.

CompanyVisitsMarket Share
OpenAI55.8 million81.0%
Anthropic (Claude)11.5 million16.6%
Perplexity1.3 million1.8%
Google (Gemini)380,0000.6%

Google's low crawler volume is notable. At just 380,000 visits (0.6%), Gemini's crawling footprint is 147 times smaller than OpenAI's. This likely reflects Google's ability to leverage its existing Googlebot index rather than deploying separate AI-specific crawlers.

Year-Over-Year LLM Referral Traffic Growth

Referral traffic from LLM-powered search is growing rapidly, with some platforms showing explosive growth:

PlatformPrevious PeriodCurrent PeriodGrowth
Total LLM Referrals93,484161,469+72.7%
ChatGPT81,652136,095+66.7%
Claude1062,488+2,247% (23x)
Copilot229,560From near-zero
Perplexity11,53313,157+14.1%
Claude referral traffic grew 23x year-over-year. While ChatGPT still dominates total referral volume (136,095 visits), Claude's jump from 106 to 2,488 referrals and Copilot's surge from 22 to 9,560 show that the LLM referral channel is diversifying rapidly.

3. What Drives AI Crawl Rates: Integrations, Schema, and Content Depth

The study isolates three categories of signals that predict higher AI crawl rates. Each contributes independently, and the effect compounds when combined.

Third-Party Integrations

IntegrationCrawl Rate (With)Crawl Rate (Without)Difference
Yext97.1%~58%+38.9pp
Reviews Integration89.8%58.8%+31.0pp

Sites with Yext integration achieved a 97.1% crawl rate, meaning nearly every site was visited by at least one AI crawler. The likely mechanism: Yext syndication distributes business data across the web, creating more reference points for AI systems to discover and validate.

Structured Data and Business Profile Signals

FeatureCrawl Rate (With)Crawl Rate (Without)Lift
Google Business Profile Sync92.8%58.9%+33.9pp
Local Schema Markup72.3%55.2%+17.1pp
Dynamic Pages69.4%58.2%+11.2pp
Ecommerce54.2%59.2%-5.0pp
Ecommerce sites show a negative crawl correlation (-5.0pp). This may reflect that many e-commerce product pages lack the informational content depth that AI crawlers prioritize. Product catalogs with thin descriptions get deprioritized relative to content-rich informational sites.

The granularity of structured data matters. Sites with no schema fields completed had a 55.2% crawl rate. Sites with 10–11 fields completed reached 82% — a 26.8 percentage point improvement. Each additional completed schema field adds roughly 2.7 percentage points of crawl probability. This reinforces findings from our Cloudflare Agent Readiness Score analysis on structured data's role in AI visibility.

Content Depth: The 33x Multiplier

Content volume is the single strongest predictor of AI crawler visit frequency:

1,373.7
Avg. AI visits — sites with 50+ blog posts
41.6
Avg. AI visits — sites with no blog content
33x
Difference in crawler visits

This 33x difference is the largest effect size in the entire study, reinforcing that AI systems disproportionately target content-rich sites for real-time retrieval.

4. Business Impact: Crawled Sites Get 3.2x More Traffic

The study goes beyond crawl rates to measure business outcomes. Sites that received AI crawler visits consistently outperformed uncrawled sites:

MetricAI-Crawled SitesNot CrawledMultiplier
Avg. Human Sessions527.7164.93.2x
Avg. Form Completions4.171.572.7x
Avg. Click-to-Call8.623.462.5x
Correlation vs. causation caveat: Sites that attract AI crawlers tend to be better-optimized overall, so these multipliers reflect a correlation between AI crawl activity and general site quality. However, the 90.5% crawl rate for sites with 10K+ sessions suggests that AI crawlers are drawn to sites that already have strong organic performance.

5. The Ghost Citation Problem: 62% of AI Citations Never Name You

Even if you win AI crawler attention and earn a citation in AI-generated answers, a separate problem looms: the AI probably won't mention your brand by name. Research from Kevin Indig, published in Growth Memo on April 21, 2026, quantifies what he calls the ghost citation problem.

3,981
Domains analyzed
115
Prompts tested
14
Countries
4
AI search engines tested

The study tested four AI search engines — ChatGPT, Google AI Overviews, Gemini, and Google AI Mode — and found that 62% of all citations are ghost citations. A ghost citation occurs when the AI includes a source link but never mentions the brand name in the answer text.

Citation Behavior% of Domains
Cited by AI (link provided)74.9%
Mentioned by name in answer38.3%
Both cited AND mentioned13.2%
Ghost citations (cited, never named)61.7%
The brand visibility drop is severe: When AI cites your content without mentioning your brand, the effective citation rate drops from 53.1% to just 10.6%. You supply the facts, but the AI takes the credit.

The mechanism is structural, not random. Informational content (articles, guides, how-to pages) is the most vulnerable to ghost citation because the AI extracts facts without needing to endorse the source. Comparative and evaluative content ("best X for Y", product reviews, tool comparisons) generates brand mentions because the AI must name the entities being compared. This connects directly to the ChatGPT citation mechanics study showing only 1.93% of Reddit pages get cited despite heavy retrieval.

6. Platform Comparison: How Each AI Engine Handles Citations

Each AI search engine has a distinct citation personality, and understanding these differences is critical for prioritizing your GEO strategy.

AI EngineCitation Link RateBrand Mention RateBehavior
ChatGPT87.0%20.7%High cite, low mention
Gemini21.4%83.7%Low cite, high mention
Google AI ModeModerate~37.7%Balanced
Google AI OverviewsModerate-highModerateCitation-leaning

ChatGPT and Gemini are near-opposites. ChatGPT cites sources 87% of the time but only names brands 20.7% of the time — it gives you the link but rarely the brand visibility. Gemini does the reverse: it mentions brand names 83.7% of the time but only provides a clickable citation link 21.4% of the time.

Geographic Variation in Brand Mentions

Brand mention rates vary significantly by country, which matters for international SEO strategy:

50%
India & Sweden (highest mention rates)
~35%
UK & Canada (above global average)
18–22%
Italy, Brazil, Netherlands (lowest)

The cross-engine disagreement rate is also notable: 22% of 454 prompt-domain combinations produced different mention outcomes across engines, meaning the same brand is named by one AI and ghosted by another for the same query.

Real-world example: Medium.com received 16 AI citations but zero brand mentions. Wikipedia got 27 citations but only 2 mentions. Instagram was named by ChatGPT and Gemini but ghosted by Google's own AI products.

7. Action Plan: Optimizing for Both AI Crawling and AI Citations

Combining findings from both studies, here is a concrete framework for improving both AI crawler visibility and brand citation quality.

For AI Crawl Visibility

1. Prioritize content depth over content breadth. The 33x difference in crawler visits between sites with 50+ posts and zero posts makes content volume the highest-leverage action. Publish substantive, informational blog content consistently.
2. Complete your structured data. Each additional local schema field adds roughly 2.7 percentage points of crawl probability. Complete all available schema fields — don't stop at the minimum required for rich results. Sync your Google Business Profile if applicable (92.8% vs. 58.9% crawl rate).
3. Build external data connections. Third-party integrations like Yext (97.1% crawl rate) and review platforms (89.8%) create additional signals that AI systems use for entity validation and discovery.
4. Don't block User Fetch crawlers. With 56.9% of AI crawler activity being real-time content retrieval, blocking these bots means blocking your visibility in AI answers. Review your robots.txt and consider allowing ChatGPT-User and similar user-fetch agents even if you block training bots.

For Brand Citation Quality

5. Create comparative and evaluative content. Informational content gets ghost-cited. Content that compares, evaluates, or recommends specific entities forces the AI to name brands. Shift your content mix toward "best X for Y", expert reviews, and tool comparisons.
6. Embed your brand in factual claims. When AI extracts a fact, it rarely attributes the source. When AI cites an opinion, finding, or unique methodology, it often names the author. Tie your brand to original data, proprietary frameworks, and named methodologies.
7. Monitor ghost citations. Only 22% of marketing teams have infrastructure to track AI citations. Use tools that can detect when your domain appears in AI answers and whether your brand is mentioned. Track both citation rate and mention rate separately. Our AI SEO Audit covers this analysis in depth.
Infographic showing AI crawler visit distribution across 858,457 sites, OpenAI's 81% market share, the ghost citation problem affecting 62% of AI citations, and platform comparison of citation vs brand mention rates

Related Articles

Frequently Asked Questions

What percentage of websites receive AI crawler visits?

According to an analysis of 858,457 websites in February 2026, 59% of sites received at least one AI crawler visit. Sites with over 10,000 human sessions had a 90.5% AI crawl rate, indicating that existing organic traffic strongly predicts AI crawler attention.

Which company sends the most AI crawlers?

OpenAI dominates AI crawling with 55.8 million visits out of 68.9 million total, representing 81.0% of all AI crawler traffic. Anthropic (Claude) is second at 16.6%, followed by Perplexity at 1.8% and Google Gemini at just 0.6%.

What is a ghost citation in AI search?

A ghost citation occurs when an AI search engine uses your content and includes a citation link to your site but never mentions your brand name in the answer text. Research across 3,981 domains found that 62% of all AI citations are ghost citations.

How does blog content volume affect AI crawler visits?

Sites with 50+ blog posts received an average of 1,373.7 AI crawler visits versus 41.6 for sites with no blog content — a 33x difference and the largest effect in the study.

Which AI search engine is best at mentioning brand names?

Gemini leads with an 83.7% brand mention rate but only generates citation links 21.4% of the time. ChatGPT does the opposite: it cites sources 87.0% of the time but only mentions brand names 20.7% of the time.

Does structured data help with AI crawler visibility?

Yes. Google Business Profile sync raised crawl rates from 58.9% to 92.8%. Local schema markup improved rates from 55.2% to 72.3%. Completing 10–11 schema fields reached 82% crawl rates. Third-party integrations like Yext achieved 97.1%.

Francisco Leon de Vivero
About the Author

Francisco Leon de Vivero is VP of Growth at Growing Search and a global SEO expert with 15+ years of experience across enterprise, ecommerce, and international search. He previously led Global SEO Framework at Shopify and has spoken at UnGagged, SEonthebeach, and other international conferences.

LinkedIn · YouTube · Book a Consultation

Next step

Turn this background reading into a more current SEO plan.

Use the most relevant current page below if this topic is still on your roadmap, then review the proof and contact paths if you want direct support.

Current service page

Technical SEO Advisory

The goal is not audit sprawl. It is translating complex technical issues into prioritized actions that development and marketing teams can actually execute.

Explore this service