AI Search Is Contaminating Itself: The Retrieval Poisoning Crisis and What Google Click Signals Actually Do
56% of Google AI Overview citations are ungrounded. Synthetic SEO content is poisoning RAG systems in real time. Plus: DOJ documents reveal how Navboost and RankEmbedBERT actually process click data.
AI search systems are contaminating their own outputs through a real-time retrieval loop that requires no retraining cycle to spread misinformation. An Oumi analysis of 4,326 AI Overview responses found that while 85–91% appear accurate on the surface, 56% of correct answers are ungrounded — the cited sources don't actually support the claims. Separately, DOJ antitrust documents finally clarify how Google actually uses click data through Navboost and RankEmbedBERT.
Together, these findings expose two fundamental misunderstandings in the SEO industry: that AI citations equal trustworthiness, and that clicks directly influence rankings. Neither is true — and the gap between perception and reality is widening.
1. The Retrieval Poisoning Crisis: AI Search Is Eating Itself
Unlike traditional model contamination (which requires retraining over months), RAG-based systems like Google AI Overviews, Perplexity, and ChatGPT fetch live web content and present it as authoritative answers. When that live content is itself AI-generated, hallucinated, or fabricated, the contamination is instantaneous. The retrieval layer is not a filter — it is the infection vector.
This is fundamentally different from the "model collapse" researchers have warned about. Model collapse is a slow degradation over training cycles. Retrieval poisoning is real-time. A speculative blog post published at 9 AM can be cited as authoritative fact by 10 AM. This dynamic connects to the ghost citation problem — AI systems are citing content without verifying it, and now without even verifying that the citations support the claims.
2. The Numbers: How Bad Is the Contamination?
| Metric | Finding | Source |
|---|---|---|
| AI Overview surface accuracy | 85–91% across 4,326 tests | Oumi analysis |
| Ungrounded correct answers | 56% cite unsupportive sources | Oumi analysis |
| ChatGPT "best X" listicle citations | 44% of all citations | Ahrefs study |
| GPT-5.4 vs GPT-5.3 false claims | Paid tier produces 33% fewer | SEJ analysis |
| Free-tier OpenAI users | 94% use less reliable versions | SEJ analysis |
The Oumi analysis reveals a critical distinction between surface accuracy and grounded accuracy. A response can sound correct while citing sources that don't actually support the claim. Over half of all "correct" answers fall into this category — they give the illusion of citation-backed authority without the substance. Across 5,380 sources analyzed, Facebook and Reddit ranked as the second and fourth most-cited platforms — neither of which has mechanisms to verify human authorship or factual accuracy.
3. The Mechanism: Why RAG Systems Are the Infection Vector
Two academic papers demonstrate the structural vulnerability. PoisonedRAG (Zou et al., 2024) showed that a small number of crafted passages can control RAG system outputs without compromising the model itself — injecting content into the retrieval corpus is sufficient. BadRAG (Xue et al., 2024) demonstrated semantic backdoors enabling similar manipulation through content designed to trigger specific retrieval patterns.
The practical attack chain works like this: an AI content pipeline generates a speculative article → the article gets indexed within hours → a RAG system fetches it during a user query and cites it → other AI pipelines observe the citation and reference the same content → the fabricated claim becomes "consensus" across multiple AI systems without any human verification.
xAI's Grokipedia exemplifies the endpoint of this trend — an AI-rewritten encyclopedia that bases articles on contaminated web content, including Instagram reels as sources. There is no human responsibility mechanism for correcting errors.
4. The SEO Industry's Role in the Contamination Loop
The irony is acute: the SEO industry is simultaneously the victim and the accelerant of this crisis. When AI Overviews and AI search tools began capturing traffic that previously went to publishers, agencies responded by deploying AI content pipelines at scale. But the content these pipelines generate — speculative algorithm analyses, "best X" roundups, generic how-to articles — became the raw material that other AI systems now cite.
This connects to the ChatGPT citation mechanics research showing that 44% of ChatGPT citations are "best X" listicles — the exact content formats that AI pipelines produce at highest volume, typically structured around self-interested product rankings rather than independent evaluation.
Meanwhile, human creators are abandoning the open web as the traffic bargain collapses. The content that would provide genuine first-hand expertise is increasingly published behind paywalls, in newsletters, or not at all — leaving the open web to synthetic content that AI systems will continue to ingest and cite. The zero-click survival strategies we covered earlier become even more critical in this context.
5. Google Click Signals: What the DOJ Documents Actually Reveal
DOJ antitrust documents from September 2025 cut through persistent myths about how Google uses click data. The key finding: clicks are the lowest-level data point, not a ranking factor. They are processed, aggregated, and transformed before influencing anything.
How Click Data Actually Flows Through Google's Systems
| Processing Path | System | What Happens |
|---|---|---|
| AI Model Training | RankEmbedBERT | Click data combined with human rater scores trains ranking models. Uses 1/100th the data of earlier models while producing higher quality results. |
| Aggregate Measurement | Click Fraction formula | Individual clicks are summed and normalized into statistical measures, then smoothed to prevent spam manipulation. |
| Popularity Signals | Navboost | Measures popularity through aggregate user feedback — not individual click tracking. |
The Click Fraction Formula
A 2006 Google patent describes how individual clicks become aggregate signals:
LCC_BASE = [#WC(Q,D)] / [#C(Q,D) + S0]
// #WC(Q,D) = weighted click count for query Q and document D
// #C(Q,D) = total click count for that query-document pair
// S0 = smoothing constant to prevent gaming
The smoothing constant S0 is critical: it prevents low-volume queries from being gamed by artificial clicks. Individual click manipulation is diluted by the normalization process. This is not a "more clicks = higher ranking" system — it's a statistical aggregation designed to resist exactly that kind of manipulation.
RankEmbedBERT: Less Data, Better Results
The DOJ documents reveal that RankEmbedBERT is trained on 1/100th the data of its predecessors while producing higher quality search results. This suggests Google has shifted from quantity-dependent approaches to architectures that extract more signal from less data — making the quality of training signals (including click-derived ones) more important than their volume.
6. Google's GEO Job Posting: A Mixed Signal
Google's ads organization posted a "GEO Partner Manager, Performance Solutions" role within its Large Customer Sales team. The listing mentions "Generative Engine Optimization" seven times and references analyzing "Share of Model" — a brand's visibility in AI-generated answers.
This is worth monitoring but not overstating. It represents one hiring signal from Google's advertising sales organization. The practical implication: Google's ads team sees commercial opportunity in the GEO space, even if the search quality team doesn't endorse the framework. The "Share of Model" metric is the most interesting element — if Google develops tooling to measure brand visibility within AI-generated answers, that's a signal that AI answer optimization will eventually become a paid advertising surface, not just an organic discovery channel.
Frequently Asked Questions
What is retrieval-layer poisoning in AI search?
Retrieval-layer poisoning occurs when RAG-based AI search systems fetch live web content that contains AI-generated misinformation, then cite it as factual. Unlike training-data contamination which requires retraining cycles, retrieval poisoning happens in real time — a fabricated article can be indexed and cited within 24 hours.
What percentage of Google AI Overview citations are ungrounded?
According to an Oumi analysis of 4,326 AI Overview tests, while 85–91% showed surface accuracy, 56% of correct answers were ungrounded — the cited sources did not actually support the claims being made.
Does Google use clicks as a direct ranking factor?
No. According to DOJ antitrust documents from September 2025, clicks are the lowest-level data point that gets processed into higher-level signals. Google aggregates click data into statistical measures and uses it to train AI models like RankEmbedBERT. Individual clicks do not directly rank websites.
What is Navboost and how does it affect rankings?
Navboost is a Google ranking system that measures popularity through aggregate user feedback. It processes aggregated click data — not individual clicks — to create signals about user satisfaction and content relevance.
How does synthetic SEO content create a contamination loop?
SEO agencies deploy AI content pipelines that generate speculative articles. Other AI pipelines cite those articles as sources. RAG systems fetch this content in real time and present it as factual. A documented example: Perplexity cited a nonexistent "September 2025 Perspective Core Algorithm Update" sourced entirely from AI-generated SEO blogs.
What is Google's position on Generative Engine Optimization (GEO)?
Google sends mixed signals. Gary Illyes stated that standard SEO suffices for AI Overviews. However, Google's ads organization posted a "GEO Partner Manager" role mentioning GEO seven times and referencing "Share of Model" analysis. The search and ads teams appear misaligned.
What is "Share of Model" and why does it matter?
Share of Model measures a brand's visibility in AI-generated answers — how often a brand appears when AI systems respond to relevant queries. It represents a shift from traditional Share of Voice metrics toward measuring influence within AI answer engines, and may signal future paid advertising surfaces.
