Engineering

How We Built Memory Into an AI OS: The 9 Neuroscience Algorithms Behind ZenBrain

Alexander Bering
Alexander Bering
April 4, 2026 · 7 min read

Most AI systems forget everything the moment a conversation ends.

That's not a bug — it's a design choice. But it's the wrong one. Knowledge workers don't think in isolated sessions. They build context over weeks, revisit half-formed ideas, connect a meeting from Tuesday with a document from last month.

So when we built ZenBrain — the open-source memory system at the core of ZenAI — we decided not to paper over this problem with longer context windows or naive vector search. We went deeper.

We read the neuroscience.

Why Neuroscience?

The human brain has been solving the "what to remember, what to forget, and when to retrieve it" problem for 300 million years. It does this with remarkable efficiency — not by storing everything, but by selectively consolidating, linking, and decaying information based on relevance and use.

Modern AI memory systems mostly use one of three approaches: RAG (dump everything into a vector DB), fine-tuning (bake knowledge into weights), or naive key-value stores. All three have serious limitations for long-running personal AI.

We implemented 9 algorithms — each grounded in a specific peer-reviewed neuroscience or information theory paper — that work together as a coherent memory architecture.

The 9 Algorithms

During Conversation (Active, Real-Time)

#A — vmPFC Prediction-Error Coupled FSRS Zou et al., Cell Reports 2025 (7T fMRI study)

Standard spaced repetition systems (Anki, FSRS-5, SuperMemo) schedule reviews based on recall history. But a 2025 fMRI study found that the vmPFC encodes re-encoding similarity — how much a new experience matches a past prediction — and this signal, not repetition count, drives which memories get strengthened.

We coupled this prediction error (PE) signal to the FSRS scheduling algorithm: when your Knowledge Graph shows high PE at review time (the information surprised you), the interval shortens — ideal learning window. Low PE means the information was expected; extend the interval. This is the first biologically-motivated adaptive FSRS extension in any software system.

#B — Compositional Context Embeddings Nature 2025 (Compositional PFC) + bioRxiv 2025 (Orthogonal Neural Codes)

ZenAI has 4 contexts: personal, work, learning, creative. The naive approach is to completely isolate them. But that means insights from your learning context can never inform work — exactly what a human brain does do.

The neuroscience shows that prefrontal cortex encodes different task contexts in orthogonal neural subspaces — separate but not hermetically sealed. We implement this as:

h(context, memory) = P_shared × embedding + Q_context × context_code

where P^T × Q = 0 (mathematical orthogonality). Cross-context transfer is permitted through the shared subspace; contamination is blocked by the orthogonal context codes.

#G — iMAD Selective Debate Protocol arXiv 2511.11306, Nov 2025

Multi-Agent Debate (MAD) improves AI answer quality, but it's expensive — you're running the same query through multiple models and synthesizing. We implemented a selective trigger: a single agent first generates self-critique and hesitation features (confidence gap, hedging language, detected contradictions). A lightweight classifier decides whether to escalate to full debate or accept the initial response.

Result: ~92% token cost reduction while maintaining ~13.5% accuracy improvement over single-agent. Debate only happens when it's actually needed.

#H — Metacognitive HyperAgent arXiv 2603.19461, Meta AI, March 2026

This one is recursive. The system doesn't just improve task performance — it improves the improvement mechanism itself. The Metacognitive HyperAgent analyzes strategy performance patterns, generates meta-insights about its own reasoning, and proposes structural changes to how it approaches future tasks.

Budget-constrained to max 3 meta-improvements per day with governance-layer approval requirements. Unbounded self-modification is how you get misaligned systems; bounded meta-cognition with human-in-the-loop is how you get trustworthy ones.

During Sleep (Nightly Consolidation Pipeline)

Every night, when the system is idle, a BullMQ job triggers a 5-stage consolidation pipeline that runs the remaining 5 algorithms in sequence.

#C — Simulation-Selection Sleep Loop Frontiers in Computational Neuroscience 2025

Inspired by hippocampal replay during sleep. Stage 1 (CA3-analog): generate diverse replay candidates from the episodic buffer, including counterfactual variations of failed interactions. Stage 2 (CA1-analog): score candidates using:

Tag(episode) = α × |TD_error| + β × Reward + γ × Novelty

High-scoring replays receive LTP (Long-Term Potentiation — strengthening). Low-scoring replays receive LTD (Long-Term Depression — decay). The system literally replays its day and decides what mattered.

#D — Two-Factor Synaptic Model Zenke et al., PNAS 2025

Standard Knowledge Graph edges have a single weight. We extended them to (weight, variance) pairs. Variance decreases with activation frequency — an edge you've traversed 100 times has low variance (mature, stable). An edge you've traversed twice has high variance (uncertain).

The importance score 1/variance serves as a Fisher Information proxy, making this mathematically equivalent to Elastic Weight Consolidation (EWC) — a continual learning technique that prevents catastrophic forgetting by protecting mature connections. New learning can't overwrite well-established knowledge.

#F — Context-Adaptive Information Bottleneck Budget MemFly, February 2026 (Information Bottleneck)

Information Bottleneck theory gives us a formal way to decide what to compress: retain if I(Z;Y) × β > I(X;Z). The β parameter controls the relevance/compression tradeoff.

We made β context-dependent:

| Context | β | Philosophy | |---------|---|------------| | Work | 0.8 | Retain anything potentially relevant | | Learning | 0.6 | Balanced — compression creates abstraction | | Personal | 0.4 | Forgetting irrelevant data is a feature | | Creative | 0.3 | Maximum compression to pure concepts |

#I — Dual-Process CoT Consolidation arXiv July 2025 (Dual-Process Compositional Learning)

Chain-of-Thought reasoning produces valuable intermediate steps that are usually discarded after the response. We formalized the path from CoT to persistent knowledge:

  • Phase 1 (Hippocampal): successful reasoning chains → episodic memory (high fidelity, exact)
  • Phase 2 (Cortical, during sleep): abstract successful chains → schema nodes in the Knowledge Graph
  • Failed chains remain episodic for potential replay and learning (see #C)

This implements the hippocampus-to-cortex transfer model: fast binding first, slow cortical consolidation later.

#E — Spectral KG Health Monitor Nature Communications 2023 (Causal Hubs) + Spectral Graph Theory

After each sleep cycle, we check: did consolidation make the Knowledge Graph healthier or more fragmented? We use the algebraic connectivity (Fiedler value, λ₂) of the Graph Laplacian L = D - A.

Rising λ₂ after sleep = successful consolidation (graph is better connected). Falling λ₂ = fragmentation alert — the system over-pruned or created disconnected components. This gives us an objective, mathematically-grounded quality metric for memory health.

How It All Fits Together

Conversation
 #A vmPFC-FSRS        when to review this memory
 #B Compositional     which context subspace
 #G iMAD              is debate needed?
 #H HyperAgent        bounded self-improvement

Night (Sleep Pipeline)
 Stage 1: #C  Simulate + select replays
 Stage 2: #D  Two-factor synaptic consolidation
 Stage 3: #F  IB Budget filter per context
 Stage 4: #I  Abstract CoT chains to KG schemas
 Stage 5: #E  Spectral health check (Fiedler value)

Every algorithm is registered in an ablation toggle registry for controlled experiments. Each can be independently disabled to measure its contribution — a requirement for any serious NeurIPS submission.

What Makes This Different

No existing AI memory system combines all of these. Mem0, Letta, Zep, and MemGPT are excellent systems, but they operate primarily at the retrieval layer. ZenBrain operates at all four levels:

  1. Encoding — what gets stored and how (orthogonal subspaces, context-adaptive)
  2. Consolidation — what gets strengthened during offline processing (sleep pipeline)
  3. Retrieval — what gets surfaced and when (PE-coupled FSRS, hybrid BM25+semantic)
  4. Forgetting — what gets pruned and why (IB budget, synaptic decay)

Controlled forgetting is not a weakness. It's the feature that makes the rest work.

Open Source

ZenBrain is Apache-2.0-licensed and published as zero-dependency npm packages:

npm install @zensation/algorithms @zensation/core
# PostgreSQL adapter
npm install @zensation/adapter-postgres
# SQLite (zero-config)
npm install @zensation/adapter-sqlite

The full technical report and algorithm implementations are in the repo. NeurIPS 2026 submission in progress (abstract deadline May 4).

We're building in public. Questions, critiques, and collaboration welcome.

Want to go deeper? Read our post on the 7-Layer Memory Architecture or the Science Behind ZenBrain.