Engineering

Seven Layers: A Memory Model Takes Shape

Alexander Bering

January 28, 2025 · 3 min read

From a stack of references to a model

The 2024 sketch had the right instincts and no edges. Over the following months it hardened into something we could actually reason about: seven memory layers, each with a defined role, and a handful of principles governing how information moves between them.

The seven layers, in plain terms:

Working memory — the active focus of the current task.
Short-term memory — the context of the current session.
Episodic memory — concrete experiences, with their time and surroundings.
Long-term memory — the durable, distilled knowledge.
Procedural memory — learned routines and how-to.
Core memory — a small set of pinned, always-present facts.
A predictive layer — the part that anticipates and revises, which we left deliberately underspecified at this stage.

The number seven is not sacred. It is the smallest set that gave each documented function its own home without forcing two unlike things to share one.

Consolidation, not retrieval, at the centre

The design decision we are most committed to is also the least obvious: the interesting work happens between writes and reads, not during them.

Most systems do everything at write time or query time. We pushed the important transformations into a separate, scheduled process — a consolidation pass, in the spirit of sleep consolidation, that replays recent experience, strengthens what is used, lets the incidental fade, and promotes episodic detail into semantic knowledge. Retrieval then operates over a store that has already been organised, rather than over raw sediment.

This is a methodical choice as much as a biological analogy. It means the system's behaviour over weeks is governed by a process we can inspect and tune, not by an accident of what happened to be written down.

Confidence as a first-class output

The second principle: a memory system should know how sure it is. Recall is not binary. A well-timed review makes a memory more retrievable; time without review makes it less so; some facts are corroborated from several directions and some rest on a single mention.

So from early on we treated confidence as something to be propagated and reported — a calibrated estimate attached to what the system returns, rather than a single best guess presented with uniform certainty. A system that can say "I am fairly sure, on weak evidence" is more useful than one that presents every answer with the same certainty, and it makes its own limits legible.

Still an open question

In January 2025 this was a model and an early implementation, not a result. The principles were defensible on paper; whether they would outperform a plain vector store under real load was unproven. But the model now had edges sharp enough to test — which is the only state from which a research question can move forward.

The model became running code, layer by layer: The 7-Layer Architecture, The Art of Forgetting, and Sleep Consolidation.

Seven Layers: A Memory Model Takes Shape

From a stack of references to a model

Consolidation, not retrieval, at the centre

Confidence as a first-class output

Still an open question

Related Articles

From AI Overview to a Real Demo: Turning a Blueprint into a ZenAi Instance — Autonomously

The Stress Test: Why We Ran Ten Security Sprints Before the First Customer

91 % of the accuracy at 1 % of the tokens — the Pareto position for AI memory

Seven Layers: A Memory Model Takes Shape

From a stack of references to a model

Consolidation, not retrieval, at the centre

Confidence as a first-class output

Still an open question

Related Articles

From AI Overview to a Real Demo: Turning a Blueprint into a ZenAi Instance — Autonomously

The Stress Test: Why We Ran Ten Security Sprints Before the First Customer

91 % of the accuracy at 1 % of the tokens — the Pareto position for AI memory