Engineering

Research as Documentation: Pre-Registration and Replication

Alexander Bering

August 19, 2025 · 3 min read

The point at which a project needs a method

By mid-2025 there was enough working code that the risk shifted. The danger was no longer "will any of this run" but "will we be able to trust, and later defend, what it tells us." An independent effort has no institutional review board looking over its shoulder. That absence has to be replaced with method, deliberately, or the results are worth very little.

So we made documentation a research output in its own right, on the same footing as the code. Three practices carried most of the weight.

Pre-registration, with a timestamp

Before running the experiments that mattered, we wrote down what we expected and how we would measure it — and anchored those documents in time using OpenTimestamps, which records a cryptographic proof of a file's existence on a given date.

The reason is simple and slightly uncomfortable: it is very easy, after seeing results, to convince yourself you predicted them. A timestamp removes the temptation. A decision recorded before the data cannot be quietly rewritten after it. For a single-investigator effort this is one of the cheapest available defences against self-deception.

Replication as the default

Every result we care about ships with the material needed to reproduce it — the data, the configuration, the procedure — rather than as a number in a slide. This is partly principle and partly self-interest: code that cannot be re-run is code whose results you will eventually be unable to explain, including to yourself six months later.

It also changes how a claim reads to an outside reader. "We observed X" is an assertion. "We observed X, here is how to obtain it" is an invitation to check. Only the second belongs in research.

Negative results are results

The last practice is the hardest to keep: writing down what did not work. Approaches that looked promising and underperformed, parameters that turned out not to matter, mechanisms we expected to help and that did not. These rarely make it into public writeups, which is exactly why public writeups tend to overstate how clean the path was.

We keep the record because the failures carry information — they mark the boundaries of where the method actually holds — and because a research programme that only ever reports its wins is not one a serious reader should trust.

What this is for

None of this is glamorous and none of it is novel; these are ordinary norms of careful science. Stating them is the point. An independent lab earns the right to be taken seriously not by asserting rigour but by leaving behind the artefacts of it — timestamps, reproducible runs, a complete ledger of dead ends — for anyone who cares to look.

What the discipline led to — checkable results instead of claims: 91% of the accuracy at 1% of the cost, the Cooperative Survival Network, and why 11,589 tests.

Research as Documentation: Pre-Registration and Replication

The point at which a project needs a method

Pre-registration, with a timestamp

Replication as the default

Negative results are results

What this is for

Related Articles

From AI Overview to a Real Demo: Turning a Blueprint into a ZenAi Instance — Autonomously

The Stress Test: Why We Ran Ten Security Sprints Before the First Customer

91 % of the accuracy at 1 % of the tokens — the Pareto position for AI memory

Research as Documentation: Pre-Registration and Replication

The point at which a project needs a method

Pre-registration, with a timestamp

Replication as the default

Negative results are results

What this is for

Related Articles

From AI Overview to a Real Demo: Turning a Blueprint into a ZenAi Instance — Autonomously

The Stress Test: Why We Ran Ten Security Sprints Before the First Customer

91 % of the accuracy at 1 % of the tokens — the Pareto position for AI memory