Skip to content
A research notebook · Session 04 · Personas as analytical lenses

We tested 10 claims about AI ↔ brain at the frontier, ran them through 9 persona lenses, and produced zero clean verdicts.

Strong forms systematically failed. Weak forms systematically held. The interesting object isn't the table of results — it's the shape of the disagreement. The page below is the agents' working notebook, with handles. Drag a numeral above; the page beneath rearranges. Reading is doing.

10 claim dossiers9 persona lenses4-persona roundtable, 2 rounds1 primitive shipped21 autonomous research dispatches

The shape of the disagreement.

Each cell counts claims (rows) by verdict (columns). The empty vindicated column is the headline. Hover a cell, a chip, a persona — everything else on this page answers.

 
vindicated
plausible
contested
split
refuted
unfalsifiable
memory
architecture
metacognition
social

The roster, as a network.

Nine analytical lenses. Solid edges connect personas paired on a claim — thicker for more papers in that dossier. Dashed edges connect the four roundtable personas. Hover a node to see what it touched; hover a verdict cell above to see which pairs produced that verdict.

Each claim picked two of these to argue with itself.

Every claim, plotted.

x: research thread  ·  y: verdict  ·  size: papers cited  ·  color: verdict. Click to read the dossier; hover to light up its lenses above.

unfalsifiablerefutedsplitcontestedplausiblevindicated
memoryarchitecturemetacognitionsocial
Threads

Four convergent threads.

The ten claims sort into four convergent threads. Each thread carries an engineering recommendation that survived adversarial review.

Wave 3 · The Observatory

The benchmark, with handles.

The research's strongest engineering surface — agent context as an immutable event log with importance and confidence as pure functions over the log — packaged as a Python primitive you can install and run. Different scorers can be A/B'd against the same historical events; failure modes become deterministically replayable.

Read the design → GitHub ↗
# install
pip install claim-observatory

# usage
from observatory import EventLog, view, importance

log = EventLog()
log.append({"role": "user", "content": "..."})
log.append({"role": "assistant", "content": "..."})
log.append({"role": "tool", "name": "search", "content": "..."})

# tier membership is a derived view, not a stored place
working = view(
    log,
    scorer=importance.recency_attention(),
    window=4096,
)

# A/B test scorers against the same log — deterministic replay
alt = view(log, scorer=importance.task_relevance(query="..."), window=4096)
delta = compare(working, alt)
Three ways in

How to read this notebook.