A research notebook · Session 04 · Personas as analytical lenses

We tested 10 claims about AI ↔ brain at the frontier, ran them through 9 persona lenses, and produced zero clean verdicts.

Strong forms systematically failed. Weak forms systematically held. The interesting object isn't the table of results — it's the shape of the disagreement. The page below is the agents' working notebook, with handles. Drag a numeral above; the page beneath rearranges. Reading is doing.

10 claim dossiers9 persona lenses4-persona roundtable, 2 rounds1 primitive shipped21 autonomous research dispatches

The shape of the disagreement.

Each cell counts claims (rows) by verdict (columns). The empty vindicated column is the headline. Hover a cell, a chip, a persona — everything else on this page answers.

vindicated

plausible

contested

split

refuted

unfalsifiable

memory

architecture

metacognition

social

Hover anything; the matched dossiers appear here.— matches

01Magical Number Seven 02Thalamic-Cortical Equivalence 03Persona States 04Metacognition 05RAG TOT 06CoT Phenomenology 07Spontaneous ToM 08Sleep Consolidation 09Active Forgetting 10Cortical Column

The roster, as a network.

Nine analytical lenses. Solid edges connect personas paired on a claim — thicker for more papers in that dossier. Dashed edges connect the four roundtable personas. Hover a node to see what it touched; hover a verdict cell above to see which pairs produced that verdict.

Each claim picked two of these to argue with itself.

Every claim, plotted.

x: research thread · y: verdict · size: papers cited · color: verdict. Click to read the dossier; hover to light up its lenses above.

unfalsifiablerefutedsplitcontestedplausiblevindicated

memoryarchitecturemetacognitionsocial

01 02 03 04 05 06 07 08 09 10

Threads

Four convergent threads.

The ten claims sort into four convergent threads. Each thread carries an engineering recommendation that survived adversarial review.

Thread · memory

Memory & Context

LLMs are place-oriented memory systems with shallow effective capacity and no native consolidation/forgetting machinery.

01 Magical Number Seven 05 RAG TOT 08 Sleep Consolidation 09 Active Forgetting

Thread · architecture

Architecture & Computation

Brain ↔ transformer mappings hold at the algorithmic level on a restricted subspace; strong 'homology' claims fail.

02 Thalamic-Cortical Equivalence 10 Cortical Column

Thread · metacognition

Metacognition & Self-Model

LLM self-models are real but shallow; the legible CoT trace cannot be trusted as a window into them.

04 Metacognition 06 CoT Phenomenology

Thread · social

Social & Identity

What looks like emergent social cognition is mostly pretraining-derived capability being elicited, not generated.

03 Persona States 07 Spontaneous ToM

Wave 3 · The Observatory

The benchmark, with handles.

The research's strongest engineering surface — agent context as an immutable event log with importance and confidence as pure functions over the log — packaged as a Python primitive you can install and run. Different scorers can be A/B'd against the same historical events; failure modes become deterministically replayable.

Read the design → GitHub ↗

# install
pip install claim-observatory

# usage
from observatory import EventLog, view, importance

log = EventLog()
log.append({"role": "user", "content": "..."})
log.append({"role": "assistant", "content": "..."})
log.append({"role": "tool", "name": "search", "content": "..."})

# tier membership is a derived view, not a stored place
working = view(
    log,
    scorer=importance.recency_attention(),
    window=4096,
)

# A/B test scorers against the same log — deterministic replay
alt = view(log, scorer=importance.task_relevance(query="..."), window=4096)
delta = compare(working, alt)

Three ways in

How to read this notebook.

Skim

10 min

Read the synthesis only.

Continue →

Survey

1 hour

Synthesis + the 3–4 claims most relevant to your work.

Continue →

Deep

3 hours

All ten dossiers in order, plus the curated reading list.

Continue →