How We Built a Memory System That Thinks Like a Brain

Alexander Bering
Alexander Bering
March 29, 2026 ยท 7 min read

The Starting Point: A Vector Store

Like every AI project in 2024, we started with a simple pattern: embed text, store vectors, retrieve by similarity. It took about two weeks to have a working prototype.

And about three weeks to realize it wasn't enough.

The problem wasn't retrieval accuracy. Cosine similarity works surprisingly well for finding relevant text. The problem was everything else:

  • The system couldn't forget. Irrelevant notes from months ago polluted every query.
  • It had no sense of time. A fact learned yesterday had the same weight as one from last year.
  • It couldn't connect ideas. Two related concepts stored separately stayed separate forever.
  • It didn't get better with use. The more you stored, the noisier it got.

This is the dirty secret of most AI memory implementations: they scale in storage, but they degrade in quality. More data doesn't make them smarter. It makes them louder.

The Neuroscience Detour

I spent two months reading neuroscience papers instead of writing code. It was the best investment of the entire project.

The human brain doesn't have one memory system. It has at least five, depending on how you count. Each one serves a different purpose, operates on different timescales, and fails in different ways. The interplay between them is what creates intelligent memory.

Three papers changed everything:

Stickgold & Walker (2013) showed that memory consolidation during sleep is critical for learning. The brain doesn't just store memories while you sleep โ€” it actively replays them, strengthens important connections, and prunes weak ones. This is why you sometimes wake up with a solution to a problem you were stuck on.

Ebbinghaus (1885) quantified forgetting. The forgetting curve is exponential, but each review resets and flattens it. This became the foundation for our decay model.

Hebb (1949) proposed that neurons that fire together wire together. This simple principle โ€” co-activation strengthens connections โ€” became our knowledge graph model.

Building Layer by Layer

We didn't build all seven layers at once. We started with three (Working, Episodic, Long-Term) and added the others as we understood the problems they solved.

The Working Memory Problem

Working memory is the easiest to conceptualize and the hardest to implement correctly. In humans, working memory holds approximately 7 items and constantly refreshes. In an AI system, the question is: what counts as an "item"?

Our solution: Working memory is session-scoped, capacity-limited, and priority-ordered. When the system is processing a query, working memory holds the most relevant recent context. When you switch topics, it clears and reloads. This prevents the "topic contamination" problem where context from one conversation bleeds into another.

The Episodic vs. Semantic Split

This was Tulving's key insight: "I had pasta for lunch" is fundamentally different from "pasta is an Italian food." The first is a personal experience tied to time and place. The second is abstract knowledge.

We implemented this distinction. When you tell ZenBrain "we decided to use PostgreSQL in yesterday's meeting," it stores an episodic memory with timestamp, participants, emotional valence, and outcome. Over time, through consolidation, the abstract knowledge ("we use PostgreSQL") migrates to semantic memory โ€” just like in the human brain, where repeated episodes crystallize into general knowledge.

The Forgetting Breakthrough

Counter-intuitively, the biggest quality improvement came from implementing forgetting.

Before active forgetting, every retrieval query had to wade through all stored memories. Signal-to-noise ratio degraded linearly with storage size. After implementing Ebbinghaus decay, memories that hadn't been accessed naturally faded. Not deleted โ€” just deprioritized. If you accessed them again, they strengthened.

The effect was dramatic. Retrieval accuracy improved by approximately 30% simply by reducing noise. The system got smarter by removing things, not adding them.

Sleep Consolidation: The Differentiator

This is the feature that no competitor has, and the one I'm most proud of.

During idle periods (when the system isn't processing queries), the Sleep Compute Engine activates:

  1. Selection: Identifies recent memories with high importance that haven't been consolidated
  2. Replay: Re-evaluates each memory in the context of everything else the system knows
  3. Strengthening: Uses Hebbian dynamics to reinforce connections between related memories
  4. Pruning: Weakens connections that haven't been activated, eventually removing them

The result: a memory system that genuinely improves over time. Not because it stores more, but because it organizes better.

We benchmarked this against a control (same system without consolidation). After 30 days of simulated use, the consolidated system had 23% higher retrieval precision and 15% lower latency (fewer irrelevant memories to search through).

The RAG Pipeline

Memory is only half the equation. The other half is retrieval โ€” how you find the right memory at the right time.

Our RAG pipeline has evolved through 141 phases. The current version includes:

  • HyDE (Hypothetical Document Embeddings): Instead of searching for the query directly, we generate a hypothetical answer and search for that. This bridges the vocabulary gap between questions and stored knowledge.
  • Cross-Encoder Re-ranking: After initial retrieval, a cross-encoder model re-ranks results for semantic relevance.
  • Self-RAG Critique: If confidence is below 0.5, the system automatically reformulates the query and tries again.
  • Contextual Retrieval: Using the Anthropic method for document chunking, which improved retrieval accuracy by 67%.
  • Confidence Scoring: A 4-component score (topScore, avgScore, variance, diversity) that tells you how reliable the retrieved context is.

Each of these improvements was validated with A/B testing against real usage patterns.

The Numbers

After 141 development phases:

  • 9,200+ tests (backend + frontend + CLI, 0 failures)
  • 276 tests for the memory algorithms alone
  • 12 neuroscience algorithms in the open-source core
  • 7 memory layers with full cross-layer interaction
  • 60 AI tools across 14 categories
  • 4 context schemas with complete isolation
  • 95% confidence intervals for all probabilistic outputs

What We Got Wrong

Not everything worked on the first try.

Over-engineering episodic memory: Our first implementation stored too much context per episode. Processing time was slow and storage bloated. We learned to capture the essential elements (what, when, who, outcome) and let semantic memory handle the rest.

Consolidation frequency: Initially, we ran consolidation every hour. This was too aggressive โ€” memories didn't have time to accumulate access patterns. We settled on daily consolidation with event-triggered micro-consolidations for high-importance items.

Knowledge graph normalization: Without normalization, frequently co-activated concepts would dominate the graph. A single topic could end up connected to everything, making the graph useless. Hebbian normalization with a logarithmic scale fixed this.

Open Source

The algorithms are published as @zensation/algorithms and @zensation/core on npm. Zero external dependencies, TypeScript-native, MIT licensed.

npm install @zensation/algorithms @zensation/core

GitHub: github.com/zensation-ai/zenbrain

We believe that memory is too fundamental to be proprietary. If you're building AI applications, you shouldn't have to reinvent forgetting curves and spaced repetition from scratch.

What's Next

We're working on:

  • Retention curve visualization โ€” export Ebbinghaus curves as chart data
  • Cross-context entity merging โ€” detect when the same concept appears in different contexts
  • Adaptive consolidation โ€” adjust consolidation frequency based on usage patterns
  • Multi-agent shared memory โ€” let specialized AI agents share a common knowledge base

The gap between AI memory and human memory is still enormous. But it's shrinking.

If you're working on AI memory, we'd love to hear from you. Find us on GitHub, Twitter, or Discord.