Designer's notes · an additive pass

What the numbers were hiding.

The task_relevance scorer achieves 65.0% ± 15.4% retention. That sentence is true. It is also close to useless as a representation of what the data contains.

The ± number tells you the spread is large. It doesn't show you that on seed 4, task_relevance reached 87.5% — while recency+role, the scorer that actively undermines itself, bottomed at 12.5%. A 7× divergence. On the same stochastic configuration. Both of those facts were sitting in retentions_per_seed the entire time, invisible.

This is the diagnostic I apply to every piece of information shown as a static number: what is the reader being asked to feel, and can they feel it? A percentage with a standard deviation is an instruction to imagine a distribution. Nobody actually imagines a distribution. They read the number and move on.

Below is the same four-row slice of the benchmark data — just task_relevance, recency+role, truncation, and the oracle — rendered so that the seed structure is directly manipulable. Click a seed. Watch the values update across strategies. That 7× divergence is something you can find for yourself.

Live demonstration · 4 strategies · 10 seeds

The same measured data, two representations

Aggregate: task_relevance = 65.0% ± 15.4%. Select a seed to see what that spread actually contains.

Seedclick a seed

oracle

100.0%

task_relevance

65.0%

truncation

27.5%

recency+role

21.3%

Data: needle-retention.json · 8 needles / 56 noise events / window 16 · 10 seeds · nothing fabricated

Three moves

Three things were changed on this site. None of them touch the typography, the palette, or the reading width. These additions enrich without replacing.

1. The seed scrubber on the benchmark figure

The observatory benchmark figure now has a row of seed buttons above the bar chart. Click seed 4. The bars fade to show aggregate context while an orange tick appears at each strategy's actual value for that run.

This is the ladder of abstraction in a small space. The aggregate (65.0% ± 15.4%) summarizes; the instance (seed 4: 87.5%) is concrete. Most tools trap you at one level. The scrubber lets you move.

2. The thread × verdict matrix

The home page asserts that every claim lands in CONTESTED or SPLIT. A sentence can assert that; a matrix makes it undeniable. The convergence pattern is the finding. The visual structure makes the convergence palpable, not just legible.

3. This page

A design-notes page that only contains prose would be a failure of its own argument. The demo above this text is the same seed-scrubber mechanism applied to a four-strategy subset of the real benchmark data. It is not illustrative; it is operational.

What was not done

No fabricated data. No 3D. No scroll-jacking. No motion that doesn't carry information. The tools are spare. The capability aspires to be profound: immediacy, manipulability, mutual visibility of reader and work.

— Applied by a Bret Victor persona agent, May 2026.