Skip to content
Reading list · 25 sources · 9 load-bearing

What the dossiers cited.

Sorted by provenance density — how many of the ten dossiers cite the same paper. The load-bearing papers (★) are the structure of the entire session: where the dossiers agree on what to read, even when they disagree on what it means.

Verification status (re-fetched 2026-05-10): verified · 10 corrected · 3 by-name · 12

Thread

Memory & Context (Claims 1, 5, 8, 9)

Working memory, retrieval failure, sleep consolidation, active forgetting — convergent on 'LLMs as place-oriented memory systems with no native consolidation.'

1956 The Magical Number Seven, Plus or Minus Two Miller, G. A. · Psychological Review 63(2) The original. Read before believing any 7±2 LLM analogy. by-name
2001 The magical number 4 in short-term memory Cowan, N. · Behavioral and Brain Sciences 24 The revision. 4±1 is the better human number once chunking is controlled for. by-name
2023 Lost in the Middle: How Language Models Use Long Contexts Liu, Lin, Hewitt, Paranjape, Bevilacqua, Petroni & Liang · TACL · arXiv:2307.03172 The positional U-curve. ~30% accuracy drop on mid-context relevant info. verified
2024 RULER: What's the Real Context Size of Your Long-Context Language Models? Hsieh, Sun, Kriman, Acharya, Rekesh, Jia, Zhang & Ginsburg · arXiv · arXiv:2404.06654 Multi-needle benchmark. Effective vs. nominal context. verified
2024 BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Haystack Kuratov, Bulatov, Anokhin, Rodkin, Sorokin, Sorokin & Burtsev · NeurIPS Datasets & Benchmarks · arXiv:2406.10149 10–20% effective utilization finding. verified
1995 Why there are complementary learning systems in the hippocampus and neocortex McClelland, McNaughton & O'Reilly · Psychological Review 102 The CLS framework. Still the best architectural recipe for two-tier memory. by-name
2010 The memory function of sleep Diekelmann & Born · Nature Reviews Neuroscience 11 Two-phase consolidation evidence; causal role of sharp-wave ripples. by-name
2020 Brain-inspired replay for continual learning with artificial neural networks van de Ven, Siegelmann & Tolias · Nature Communications 11 State-of-the-art neuro-inspired continual learning. by-name
1966 The 'Tip of the Tongue' Phenomenon Brown & McNeill · Journal of Verbal Learning & Verbal Behavior 5 The original TOT paper. Form/content separation is the load-bearing distinction. by-name
2021 Machine Unlearning Bourtoule et al. · IEEE S&P Starting point for the unlearning literature. by-name
Thread

Architecture & Computation (Claims 2, 10)

Brain ↔ transformer mappings hold at the algorithmic level on a restricted subspace; the strong 'homology' claims fail.

2017 Attention Is All You Need Vaswani, Shazeer, Parmar, Uszkoreit, Jones, Gomez, Kaiser & Polosukhin · NeurIPS · arXiv:1706.03762 Required reading. verified
2020 Hopfield Networks is All You Need Ramsauer, Schäfl, Lehner, Seidl et al. · NeurIPS · arXiv:2008.02217 Modern Hopfield = attention. The strongest formal brain–AI bridge. verified
2025 Multihead self-attention in cortico-thalamic circuits Granier & Senn · arXiv · arXiv:2504.06354 Maps cortico-thalamic circuits to linear attention. The newest most-important brain-AI paper in this package. (Title corrected from earlier shorthand 'From Cortico-Thalamic Circuits to Linear Attention' on 2026-05-10.) corrected
2005 The cortical column: a structure without a function Horton & Adams · Phil. Trans. R. Soc. B 360 The skeptical paper. Read before any 'column ≈ block' analogy. by-name
1998 On the actions that one nerve cell can have on another: distinguishing 'drivers' from 'modulators' Sherman & Guillery · PNAS 95 Driver/modulator asymmetry that breaks symmetric attention QKV homology. by-name
2012 Canonical Microcircuits for Predictive Coding Bastos et al. · Neuron 76 Closer to a residual stream with layer specialization than to multi-head lateral attention. by-name
Thread

Metacognition & Self-Model (Claims 4, 6)

LLM self-models are real but shallow; the legible CoT trace cannot be trusted as a window into them.

2022 Language Models (Mostly) Know What They Know Kadavath, Conerly, Askell et al. · arXiv (Anthropic) · arXiv:2207.05221 Foundational behavioral metacognition paper. verified
2023 Are Emergent Abilities of Large Language Models a Mirage? Schaeffer, Miranda & Koyejo · NeurIPS 2023 · arXiv:2304.15004 Read this before believing any phase-transition claim. verified
2022 Discovering Latent Knowledge in Language Models Without Supervision Burns, Ye, Klein & Steinhardt · arXiv · arXiv:2212.03827 CCS, mechanistic introspection. verified
2024 Large Language Models Cannot Self-Correct Reasoning Yet Huang et al. · ICLR 2024 The self-correction limit. Intrinsic self-correction often degrades. by-name
2023 Consciousness in Artificial Intelligence: Insights from the Science of Consciousness Butlin, Long, Elmoznino, Bengio, Birch, Constant, Deane, Fleming, Frith, Ji, Kanai, Klein, Lindsay, Michel, Mudrik, Peters, Schwitzgebel, Simon & VanRullen · arXiv · arXiv:2308.08708 The major paper applying multiple consciousness theories to LLMs. (Author list corrected on 2026-05-10 — Chalmers is not an author; Bengio and Birch are among the actual co-authors.) corrected
2023 Integrated information theory (IIT) 4.0: Formulating the properties of phenomenal existence in physics, computation, and biology Albantakis et al. · PMC10581496 IIT 4.0 — predicts near-zero Φ for transformers. by-name
Thread

Social & Identity (Claims 3, 7)

What looks like emergent social cognition in LLMs is mostly pretraining-derived capability being elicited, not generated.

2023 Personality Traits in Large Language Models Serapio-García, Safdari, Crepy, Sun, Fitz, Romero, Abdulhai, Faust & Matarić · arXiv · arXiv:2307.00184 Reliable psychometric measurement in instruction-tuned models. verified
2025 Persona Vectors: Monitoring and Controlling Character Traits in Language Models Chen, Arditi, Sleight, Evans & Lindsey · arXiv · arXiv:2507.21509 Mechanistic confirmation that traits are causal linear directions in activation space. (Earlier shorthand attribution as 'Anthropic Persona Vectors' was overbroad — only one of the five authors is at Anthropic; corrected 2026-05-10.) corrected
2023 Large Language Models Fail on Trivial Alterations to Theory-of-Mind Tasks Ullman · arXiv · arXiv:2302.08399 Required counter-reading to Kosinski 2023. Adversarial-perturbation critique. verified
A note on verification

Each citation tagged verified was re-fetched from arXiv on 2026-05-10 and the title and authors were confirmed against the site source. Tagged corrected means the re-fetch surfaced an attribution error in an earlier draft; the entry has been fixed in place. Tagged by-name are older, well-cited works not WebFetched but uniquely identifiable by title + author search.

Two corrections worth flagging: Granier & Senn 2025's actual title is "Multihead self-attention in cortico-thalamic circuits", and Butlin et al. 2023 on consciousness in AI does not include David Chalmers as an author — it does include Yoshua Bengio, Jonathan Birch, Chris Frith, and others.