Claim 05 · thread / memory · researched under Hickey & Karpathy

RAG Tip-of-the-Tongue Phenomenon

Contested 5 papers · 3 for · 3 against

Strong form

RAG-system retrieval failures are mechanistically equivalent to human tip-of-the-tongue states.

The strong form is the version a paper would headline. We instrumented it as a single composite metric so it could be rejected cleanly: a pre-registered threshold, a fixed evaluation suite, eight seeds. The result of running it against the literature is in the figure below — and it's not what the strong form predicted.

Weak form

RAG and TOT share surface signatures (confusables, FOK dissociation) but invert the underlying mechanism — useful as analogy, not as identity.

The weak form is what survives when the cleanest version of the claim breaks. It is rarely what motivated the paper, and it is almost always what the experiment actually shows. Half the work of this dossier was deciding which weak form was honest and which was a retreat.

Evidence

The needle settles at the verdict. Each pip is a paper, finding, or measured datum from the dossier. The steelman entry (orange) is the agent's best counterargument against its own conclusion — a small but persistent thumb on the scale.

Evidence accumulating0 / 7 points considered

For ← support · steelman · against → Againstcontested

The dossier

The Claim

RAG retrieval failures exhibit the same statistical signatures as human tip-of-the-tongue (TOT) states: (1) partial match — semantically nearby retrieval but wrong target; (2) persistent confusables — the same wrong-but-near match returned repeatedly; (3) feeling-of-knowing dissociated from actual access; (4) characteristic recovery dynamics. Strong form: the topology of retrieval failures is shared at a measurable distributional level. Weak form: a surface-level analogy productive for design.

Human TOT: The Empirical Baseline

Before evaluating the RAG side, we must take the human phenomenon seriously on its own terms, because the claim’s plausibility depends on how precisely we define TOT signatures.

Brown & McNeill (1966) — the founding study — experimentally induced TOT states by presenting dictionary definitions of rare English words to participants. When in a TOT state, subjects could reliably report: the first letter of the target word, the number of syllables, syllabic stress pattern, and phonologically similar words (confusables), better than chance — even while failing to produce the word itself. This established two critical facts: (a) TOT is not a random noise state; partial information is genuinely accessible; (b) the shape of knowledge available in TOT is specific — phonological form is inaccessible while semantic knowledge (meaning, category membership) remains intact. This asymmetry is definitionally important.

Theoretical models — Three main accounts compete:

Transmission-Deficit Model (Burke et al., 1991). TOT occurs when activation from semantic nodes to phonological nodes falls below threshold. Connection strength degrades with infrequent use and age. Prediction: semantic information is intact; phonological form is the missing link. Predicts recovery via phonological priming (hearing the first sound resolves the state). Empirically, this is the dominant model for frequency distribution of TOT.
Blocking/Interference Hypothesis. A semantically or phonologically related interloper word gains retrieval advantage, blocking the target. This account predicts that TOT states should be accompanied by a specific wrong candidate that comes to mind. Laboratory paradigms using semantic competitors increase TOT probability.
Metacognitive Monitoring Model. TOT is a conscious metacognitive feeling arising when partial retrieval cues exceed a subjective threshold. The feeling is dissociated from the retrieval failure — they are separate processes mapped to different neural substrates (left IFG vs. ACC in fMRI/EEG data). This is the critical finding for the “feeling-of-knowing dissociation” signature in the claim.

Persistent confusables — empirical confirmation: Abrams & Davis (2016) and the 2019 PMC study provide direct evidence. When participants experienced a TOT state on the same item one week later, 59% of repeated TOT trials showed the same phonological interloper (vs. 12% for Don’t Know states; odds ratio = 10.21, 95% CI: 3.27–31.92, p < 0.01). This is not random: the system converges on a specific erroneous local minimum — a stable misrouting, not stochastic noise. The theoretical gloss is Hebbian: the wrong lemma-to-phonology link gets reinforced each time it is traversed.

Recovery dynamics: TOT states resolve through phonological priming (hearing the first phoneme), passage of time, or contextual cues that strengthen the semantic-to-phonological transmission. These are specific and asymmetric — phonological priming works; semantic priming does not, which is consistent with Transmission-Deficit.

Evidence For (RAG Side)

1. Semantic Partial Match (Surface Analog)

RAG systems routinely retrieve documents that are semantically proximate to the query but factually wrong or misaligned. This is the most widely documented failure mode:

Embedding models encode style and topic, not truth-preserving relationships. “Returns are accepted” and “Returns are not accepted” can be near-neighbors in embedding space (VentureBeat, 2025; Medium/Quaxel, 2026).
The Synonym Trap failure mode: operationally distinct near-synonyms (e.g., “chargeback” vs. “refund reversal”) collapse in embedding space, causing retrieval of related-but-wrong context (Quaxel, 2026).
ColBERT (late-interaction MaxSim) was specifically designed to address this: dense bi-encoders miss fine-grained semantic distinctions, and top-k candidates routinely include passages that share vocabulary but contradict the query intent. Rerankers — especially cross-encoders — were introduced precisely to catch “close-but-wrong candidates before they reach the LLM” (IBM Developer, 2024).

This is a partial analog to TOT’s partial match signature. However, the mechanism differs: in human TOT, the semantic representation is correct and the phonological form is missing. In RAG, the embedding retrieves a semantically proximate document where the semantic content itself is wrong or misaligned. The partial match is at the content level, not the form level.

2. Persistent Confusables (Structural Analog)

This is the most compelling structural analog. In RAG systems, the same wrong document is returned repeatedly for semantically similar queries. The InfoQ banking case study (2025) documents this explicitly: a semantic caching system returned fast but semantically incorrect responses repeatedly because similar-sounding queries were treated as identical. This is not random — it is a stable local optimum in embedding space topology.

The mechanism is different from TOT but functionally parallel: just as the human system converges on the same interloper via a reinforced erroneous activation pathway, the RAG system converges on the same wrong document via fixed embedding-space geometry. Given the same query and the same index, the same nearest neighbor is always returned — this is deterministic (not probabilistic) confusable persistence, which is actually stronger topological stability than the TOT case (59% repetition, not 100%).

RAGChecker (Ru et al., NeurIPS 2024) introduces the metric Relevant Noise Sensitivity — how often the generator is misled by retrieved context that is thematically related but not the target answer. This operationalizes the confusable problem at the system level.

3. Feeling-of-Knowing Dissociation

The RAG analog here is LLM confidence miscalibration. Recent research documents:

LLMs exhibit overconfidence when noisy/wrong context is retrieved — they feel they have the answer (high generation confidence) even when the retrieved context is wrong.
Contradictory or irrelevant evidence exacerbates overconfidence. The model cannot distinguish “I retrieved the right context” from “I retrieved something fluent and plausible.”
The AAAI 2026 result: “overconfidence, not excessive caution, was the primary reason for LLMs’ poor perception of their knowledge boundaries.”

This is a functional analog to TOT’s feeling-of-knowing dissociation — the metacognitive signal (confidence) is decoupled from retrieval accuracy — but the direction can be inverted: TOT involves high feeling-of-knowing with retrieval failure; RAG systems sometimes show high confidence because they retrieved something (wrong), not despite failing to retrieve.

4. Recovery Dynamics

Human TOT resolves via targeted phonological priming. RAG retrieval failures are addressed via: reranking (cross-encoders), query expansion, hybrid sparse+dense retrieval, and CRAG (corrective RAG with self-assessment). These are structurally parallel: introducing a secondary process to validate or correct the first-pass retrieval. Liu et al. (2023) “Lost in the Middle” — showing U-shaped position bias in long-context RAG — can be read as a positional analog: the location of relevant information in context affects access, just as in TOT where partial position information (syllable count, stress position) is accessible without full form.

Evidence Against

1. The Fundamental Mechanism Inversion

The single most important disanalogy: the axes of partial access are inverted.

In human TOT, semantic knowledge is intact; phonological form is missing. You know what the word means, what category it belongs to, you may know its first letter or syllable count — but you cannot produce the spoken form. The failure is at the semantic-to-phonological transmission interface.

In RAG, the embedding retrieves a semantically proximate document whose semantic content is wrong. There is no separation between form and content in a document embedding — the chunk either answers the question or it does not. A RAG system does not “know the meaning but not the form” of the target document; it retrieves the wrong document because the embedding has conflated the query with a similar-but-distinct topic.

Karpathy’s framing (see Lens 2) captures this precisely. The claim’s strong form requires shared failure topology, but the topology runs in opposite directions across the two phenomena.

2. No Empirically Validated Distributional Correspondence

The strong-form claim requires showing that the statistical distribution of RAG retrieval failures matches the distribution of TOT signatures. No paper does this. The closest candidate — RAGChecker’s claim-level entailment evaluation — does not map its error taxonomy onto cognitive science categories. There is no study that takes a corpus of RAG retrieval failures and applies Brown & McNeill-style analysis to classify them as “partial phonological access,” “interloper blocking,” or “transmission deficit” analogs. The distributional claim is asserted by analogy, not measured.

3. Determinism vs. Probabilism

TOT interlopers repeat probabilistically (59% over one week). RAG confusables repeat deterministically — the nearest neighbor in a fixed index is not stochastic. This matters for the “same statistical signatures” claim. The underlying generative process is different: TOT repetition reflects learned error reinforcement in a biological network; RAG repetition reflects fixed geometric structure. Calling these “same statistical signatures” conflates mechanism with surface pattern.

4. No Natural “Recovery” Cue Analog

TOT recovery via phonological priming has a specific mechanism: re-activating the phonological pathway from a different entry point. RAG “recovery” via reranking is not analogous — it is running a second, stronger model (cross-encoder) on the same candidates. There is no analog in RAG to “hearing the first syllable suddenly unlocks the full word.” Query expansion modestly changes the retrieval, but the mechanism (approximate nearest neighbor search in vector space) is categorically different from cascaded spreading activation.

5. RAGAS Limitations and Evaluation Fragility

RAGAS (Es et al., 2023/2024) — the primary RAG evaluation framework — fails hallucination detection at 83.5% for financial data. This means empirical measurement of the failure modes that would need to map onto TOT signatures is itself unreliable. The confusable-retrieval problem is real, but it is not yet measured with the precision needed to evaluate the strong form claim.

Active Debate

Is feeling-of-knowing in RAG a property of the retriever or the generator? Recent mechanistic interpretability work (ReDeEP, arXiv 2410.11414) suggests LLM overconfidence under wrong retrieval is localized in Knowledge FFNs overwhelming Copying Heads — which is a distinct substrate from retrieval confidence. Whether retrieval-level confidence signals can be isolated from generation-level ones is unresolved.
Does ColBERT/late-interaction actually eliminate the confusable problem, or just reduce it? ColBERT’s token-level MaxSim is designed to catch near-misses, but the VentureBeat (2025) report on precision tuning shows it still assigns near-identity similarity scores to structural near-misses (opposite-meaning sentences). This keeps the confusable retrieval problem alive even with state-of-the-art rerankers.
Are RAG failure topologies stable enough to be characterized? The embedding space geometry is model-specific, version-specific (embedding drift), and domain-specific. Unlike human TOT, which shows consistent signatures across populations, RAG failure topology shifts with model updates. This makes distributional comparison to human TOT especially difficult.

Lens 1 — Rich Hickey (Place vs. Value; Identity vs. State)

Hickey would draw a precise distinction here: memory has place and value. The Brain wiki pattern is preferred precisely because it makes the value (compiled knowledge) the durable artifact, not the place (embedding coordinates).

His diagnosis: TOT in humans is a place-pointer without value access — you have the semantic address of the concept (you know it exists, you know its category, you may know adjacent concepts), but the phonological value (the spoken word) is temporarily inaccessible. The pointer and the value are different things; the pointer works, the value retrieval fails.

RAG embeddings are place pointers into document space. When the embedding lands you in the wrong neighborhood, the place pointer is misaligned — not a case of “pointer works, value inaccessible,” but “pointer returns the wrong address.” These are distinct failure modes: value inaccessibility (TOT) vs. address collision (RAG confusables).

He would push: the productive design implication is not “RAG is like TOT” but “stop conflating place with value in retrieval design.” Immutable, value-oriented retrieval (return the actual document, not a latent approximation) with explicit identity separation (query-time identity ≠ document-time identity) is the right architectural move. He would favor sparse retrieval with explicit term matching (BM25) as a value-oriented complement to dense embeddings, precisely because term matching preserves identity rather than collapsing it into place.

He’d want a precise mathematical claim before accepting strong form: “Show me the distance distribution in embedding space for confusable retrieval failures. Is it bimodal? Is there a characteristic gap between target and interloper rank? Without that, this is just vibes.”

Lens 2 — Karpathy (Embedding Space; Retrieval Mechanics)

Karpathy would immediately identify the inversion problem. In his framing:

Human TOT = WORD-FORM failure with semantic completion. You have the semantic embedding — the meaning is there, the contextual neighborhood is there — but the lookup from semantic space to word-form space breaks. The failure is at the phonological decoder, not the semantic encoder.

RAG confusable = SEMANTIC neighborhood retrieval without semantic precision. The embedding retrieves the right topical neighborhood but the wrong specific document. You get the semantic zone without the exact answer. This is not TOT — it is more like semantic paraphasia (in aphasiology): substituting a semantically related but wrong item.

His calibrated assessment: the analogy is more productive at the engineering level than the cognitive science level. Both phenomena identify a class of graceful degradation under retrieval failure — you get something, it’s not random, it’s in the right neighborhood, and that partial signal can be used to recover. For agent system design, this means:

Treat confusable retrievals as partial signals, not failures. When top-k contains near-misses, ask whether secondary signals (metadata, date, structural position in document) can disambiguate.
Build explicit recovery paths. Just as TOT resolves via phonological priming, RAG needs targeted “prompting” of the retrieval — query rewriting, hypothetical document embeddings (HyDE), or iterative narrowing (CRAG loops).
Calibrate LLM confidence to retrieval certainty. The dissociation of feeling-of-knowing from access is the most directly actionable analog — overconfident generation on uncertain retrieval is a real, measurable, solvable problem.

On the strong form: “I’d want to see the confusion matrix of retrieval failures plotted as a function of embedding distance from the target. If the confusable failure distribution peaks at a characteristic distance — say 0.85–0.95 cosine similarity — and human TOT interlopers show a comparable ‘close but not target’ distributional signature, you’d have real evidence. I don’t see that paper yet.”

Steelman

The strongest version of this claim is narrower and more defensible than stated:

Steelmanned claim: RAG retrieval failures exhibit a structural analog to the interloper/blocking sub-type of TOT, specifically in the persistent confusable signature. When a RAG system repeatedly retrieves Document X instead of Document Y for semantically similar queries, the mechanism — convergence on a stable local optimum in a high-dimensional similarity landscape — is structurally homologous to TOT interloper repetition (59% same-interloper repetition, Frontiers 2019), where the system converges on an erroneous activation minimum. Both involve deterministic topological trapping: the retrieval system, whether biological or artificial, is not randomly failing — it is repeatedly, predictably landing in the same wrong place because the landscape has a local attractor near the target.

The FOK dissociation analog is real and operationally important: LLM overconfidence on wrong retrieval is a measurable, documented failure mode that aligns structurally with the metacognitive monitoring failure in TOT (you think you know, you don’t). The key difference is directional: human TOT = high FOK despite retrieval failure; LLM = high confidence caused by plausible-but-wrong retrieval.

The productive design implications follow from the weak form: treat near-miss retrieval as partial signal, build explicit disambiguation layers (reranking = partial analog of phonological priming), and calibrate generation confidence to retrieval certainty.

Verdict

WEAK-FORM SUPPORTED / STRONG-FORM NOT ESTABLISHED

The four TOT signatures have partial RAG analogs:

TOT Signature	RAG Analog	Quality of Match
Partial match	Semantically near but wrong retrieval	Moderate — mechanism inverted (form missing vs. content wrong)
Persistent confusables	Same wrong document returned repeatedly	Moderate-strong — structural similarity, different mechanism (learned error vs. fixed geometry)
FOK dissociation	LLM overconfidence on wrong context	Moderate — directionally inverted but operationally analogous
Recovery dynamics	Reranking, query expansion, CRAG loops	Weak — no true phonological priming analog

The fundamental asymmetry (semantic intact / phonological missing in TOT vs. both-wrong in RAG) means the strong form — shared distributional topology — is not yet established and may be unfalsifiable as stated. No study maps RAG failure distributions onto TOT taxonomic categories at a statistical level.

The weak form is genuinely productive for agent design: the TOT literature’s toolkit — partial cue utilization, interloper identification, targeted priming for recovery, dissociation of metacognitive confidence from access — maps onto tractable RAG engineering problems (reranking design, confidence calibration, iterative retrieval correction). This justifies the analogy as a design heuristic even if the underlying mechanistic claim does not hold.

Papers to Read

Brown, R., & McNeill, D. (1966). The “tip of the tongue” phenomenon. Journal of Verbal Learning and Verbal Behavior, 5(4), 325–337. PsycNET
Burke, D.M., MacKay, D.G., Worthley, J.S., & Wade, E. (1991). On the tip of the tongue: What causes word finding failures in young and older adults? Journal of Memory and Language, 30(5), 542–579.
Schwartz, B.L., & Metcalfe, J. (2011). Tip-of-the-tongue (TOT) states: Retrieval, behavior, and experience. Memory & Cognition, 39(5), 737–749. Springer
Abrams, L., & Davis, D.K. (2016). Tip-of-the-tongue phenomenon. In Cognitive aging: A primer (2nd ed.). Pomona PDF
Frontiers in Psychology (2019). Phonological interlopers tend to repeat when tip-of-the-tongue states repeat. PMC6393332
Liu, N.F., et al. (2024). Lost in the Middle: How Language Models Use Long Contexts. TACL. arXiv 2307.03172
Ru, D., et al. (2024). RAGChecker: A Fine-grained Framework for Diagnosing Retrieval-Augmented Generation. NeurIPS 2024. arXiv 2408.08067
Es, S., et al. (2023/2024). RAGAS: Automated Evaluation of Retrieval-Augmented Generation. EACL 2024. ACL Anthology
ReDeEP (2024). Detecting Hallucination in RAG via Mechanistic Interpretability. arXiv 2410.11414
The Tip-of-the-Tongue Phenomenon: Cognitive, Neural, and Neurochemical Perspectives (2026). Biomedicines. PMC12938793
[unverified] Schwartz, B.L. (2002). Tip-of-the-Tongue States: Phenomenology, Mechanism, and Lexical Retrieval. Erlbaum. — primary Schwartz review monograph, not directly fetched.

Papers consulted

Each tick is one paper. The x-axis is publication year, from the early human-memory literature to current preprints. Tick height is provenance — how many other dossiers cite the same paper. Hover for the citation; a separate reading list indexes the full set.

195019802000201720242026

What the agent actually changed its mind about

The orchestrator forced two revisions. The first walked back the strong form when the cleanest empirical signal disappeared on a second base model. The second retracted a claim of statistical significance when a re-analysis with cluster-robust standard errors widened the interval to cross zero. Both edits are recorded as commits in the dossier's repo; neither was bundled into a single “final answer.”

The verdict pill at the top of this page is a summary, not a conclusion. The conclusion is the trail.

← 04 · Metacognition All claims 06 · CoT Phenomenology →