Engineering

How We Built a Memory System That Thinks Like a Brain

Alexander Bering

March 27, 2026 · 7 min read

The Starting Point: A Vector Store

Like every AI project in 2024, we started with a simple pattern: embed text, store vectors, retrieve by similarity. It took about two weeks to have a working prototype.

And about three weeks to realize it wasn't enough.

The problem wasn't retrieval accuracy. Cosine similarity works surprisingly well for finding relevant text. The problem was everything else:

The system couldn't forget. Irrelevant notes from months ago polluted every query.
It had no sense of time. A fact learned yesterday had the same weight as one from last year.
It couldn't connect ideas. Two related concepts stored separately stayed separate forever.
It didn't get better with use. The more you stored, the noisier it got.

This is the dirty secret of most AI memory implementations: they scale in storage, but they degrade in quality. More data doesn't make them smarter. It makes them louder.

The Neuroscience Detour

I spent two months reading neuroscience papers instead of writing code. It was the best investment of the entire project.

The human brain doesn't have one memory system. It has at least five, depending on how you count. Each one serves a different purpose, operates on different timescales, and fails in different ways. The interplay between them is what creates intelligent memory.

Three papers changed everything:

Stickgold & Walker (2013) showed that memory consolidation during sleep is critical for learning. The brain doesn't just store memories while you sleep — it actively replays them, strengthens important connections, and prunes weak ones. This is why you sometimes wake up with a solution to a problem you were stuck on.

Ebbinghaus (1885) quantified forgetting. The forgetting curve is exponential, but each review resets and flattens it. This became the foundation for our decay model.

Hebb (1949) proposed that neurons that fire together wire together. This simple principle — co-activation strengthens connections — became our knowledge graph model.

Building Layer by Layer

We didn't build all seven layers at once. We started with three (Working, Episodic, Long-Term) and added the others as we understood the problems they solved.

The Working Memory Problem

Working memory is the easiest to conceptualize and the hardest to implement correctly. In humans, working memory holds approximately 7 items and constantly refreshes. In an AI system, the question is: what counts as an "item"?

Our solution: Working memory is session-scoped, capacity-limited, and priority-ordered. When the system is processing a query, working memory holds the most relevant recent context. When you switch topics, it clears and reloads. This prevents the "topic contamination" problem where context from one conversation bleeds into another.

The Episodic vs. Semantic Split

This was Tulving's key insight: "I had pasta for lunch" is fundamentally different from "pasta is an Italian food." The first is a personal experience tied to time and place. The second is abstract knowledge.

We implemented this distinction. When you tell ZenBrain "we decided to use PostgreSQL in yesterday's meeting," it stores an episodic memory with timestamp, participants, emotional valence, and outcome. Over time, through consolidation, the abstract knowledge ("we use PostgreSQL") migrates to semantic memory — just like in the human brain, where repeated episodes crystallize into general knowledge.

The Forgetting Breakthrough

Counter-intuitively, the biggest quality improvement came from implementing forgetting.

Before active forgetting, every retrieval query had to wade through all stored memories. Signal-to-noise ratio degraded linearly with storage size. After implementing Ebbinghaus decay, memories that hadn't been accessed naturally faded. Not deleted — just deprioritized. If you accessed them again, they strengthened.

The effect was dramatic. Retrieval accuracy improved by approximately 30% simply by reducing noise. The system got smarter by removing things, not adding them.

Sleep Consolidation: The Differentiator

This is the feature that no competitor has, and the one I'm most proud of.

During idle periods (when the system isn't processing queries), the Sleep Compute Engine activates:

Selection: Identifies recent memories with high importance that haven't been consolidated
Replay: Re-evaluates each memory in the context of everything else the system knows
Strengthening: Uses Hebbian dynamics to reinforce connections between related memories
Pruning: Weakens connections that haven't been activated, eventually removing them

The result: a memory system that genuinely improves over time. Not because it stores more, but because it organizes better.

In our internal testing, the consolidated system showed meaningfully higher retrieval precision and lower latency compared to the same system without consolidation — fewer irrelevant memories means less noise to search through.

The RAG Pipeline

Memory is only half the equation. The other half is retrieval — how you find the right memory at the right time.

Our RAG pipeline has evolved through 145 phases. The current version includes:

HyDE (Hypothetical Document Embeddings): Instead of searching for the query directly, we generate a hypothetical answer and search for that. This bridges the vocabulary gap between questions and stored knowledge.
Cross-Encoder Re-ranking: After initial retrieval, a cross-encoder model re-ranks results for semantic relevance.
Self-RAG Critique: If confidence is below 0.5, the system automatically reformulates the query and tries again.
Contextual Retrieval: Using the method described in Anthropic's research, which they report reduces retrieval failures by up to 67%.
Confidence Scoring: A 4-component score (topScore, avgScore, variance, diversity) that tells you how reliable the retrieved context is.

Each of these improvements was validated with A/B testing against real usage patterns.

The Numbers

After 145 development phases:

11,589+ tests (backend + frontend + CLI, 0 failures)
276 tests for the memory algorithms alone
9 foundational algorithms in the open-source core
7 memory layers with full cross-layer interaction
60 AI tools across 14 categories
4 context schemas with complete isolation
95% confidence intervals for all probabilistic outputs

What We Got Wrong

Not everything worked on the first try.

Over-engineering episodic memory: Our first implementation stored too much context per episode. Processing time was slow and storage bloated. We learned to capture the essential elements (what, when, who, outcome) and let semantic memory handle the rest.

Consolidation frequency: Initially, we ran consolidation every hour. This was too aggressive — memories didn't have time to accumulate access patterns. We settled on daily consolidation with event-triggered micro-consolidations for high-importance items.

Knowledge graph normalization: Without normalization, frequently co-activated concepts would dominate the graph. A single topic could end up connected to everything, making the graph useless. Hebbian normalization with a logarithmic scale fixed this.

Open Source

The algorithms are published as @zensation/algorithms and @zensation/core on npm. Zero external dependencies, TypeScript-native, Apache 2.0 licensed.

npm install @zensation/algorithms @zensation/core

GitHub: github.com/zensation-ai/zenbrain

We believe that memory is too fundamental to be proprietary. If you're building AI applications, you shouldn't have to reinvent forgetting curves and spaced repetition from scratch.

What's Next

We're working on:

Retention curve visualization — export Ebbinghaus curves as chart data
Cross-context entity merging — detect when the same concept appears in different contexts
Adaptive consolidation — adjust consolidation frequency based on usage patterns
Multi-agent shared memory — let specialized AI agents share a common knowledge base

The gap between AI memory and human memory is still enormous. But it's shrinking.

If you're working on AI memory, we'd love to hear from you. Find us on GitHub, Twitter, or Discord.

How We Built a Memory System That Thinks Like a Brain

The Starting Point: A Vector Store

The Neuroscience Detour

Building Layer by Layer

The Working Memory Problem

The Episodic vs. Semantic Split

The Forgetting Breakthrough

Sleep Consolidation: The Differentiator

The RAG Pipeline

The Numbers

What We Got Wrong

Open Source

What's Next

Related Articles

From AI Overview to a Real Demo: Turning a Blueprint into a ZenAi Instance — Autonomously

The Stress Test: Why We Ran Ten Security Sprints Before the First Customer

91 % of the accuracy at 1 % of the tokens — the Pareto position for AI memory

How We Built a Memory System That Thinks Like a Brain

The Starting Point: A Vector Store

The Neuroscience Detour

Building Layer by Layer

The Working Memory Problem

The Episodic vs. Semantic Split

The Forgetting Breakthrough

Sleep Consolidation: The Differentiator

The RAG Pipeline

The Numbers

What We Got Wrong

Open Source

What's Next

Related Articles

From AI Overview to a Real Demo: Turning a Blueprint into a ZenAi Instance — Autonomously

The Stress Test: Why We Ran Ten Security Sprints Before the First Customer

91 % of the accuracy at 1 % of the tokens — the Pareto position for AI memory