Engineering

Why We Wrote 11,589 Tests for a Solo Project

Alexander Bering

May 6, 2026 · 6 min read

Update April 2026 — This article was published on April 7 with 9,228 tests at the time. As of Phase 145 the count is 11,589 tests. The numbers below are the launch-day snapshot; the arguments are the same.

11,589 tests. 24 intentionally skipped. 0 failures.

When I tell people this, the first reaction is usually: why? You're a solo developer. This is a side project. Nobody is paying you to write tests.

That reaction misunderstands what tests are for.

Tests Aren't About QA. They're About Velocity.

Here's the honest backstory: ZenAI is 145+ phases of development, built over roughly 13 months. At phase 50, I had about 2,000 tests. At phase 100, around 5,000. At phase 141, 9,228 — today, at phase 145, it is 11,589.

The counter-intuitive discovery: the more tests I had, the faster I could ship new features.

Not slower. Faster.

When you have comprehensive test coverage, you can refactor fearlessly. You can add a new module without mentally tracking all the things that might break. You can merge a 200-file PR and immediately know whether anything regressed — not by running the app and clicking around for an hour, but in 45 seconds.

That's not overhead. That's a superpower.

The Composition

Backend  — 7,720 tests  (Jest, TypeScript)
Frontend — 1,400 tests  (Vitest)
CLI      —   108 tests  (Jest)
─────────────────────────
Total    — 9,228 passed
Skipped  —    24 (all intentional)
Failed   —     0

The 24 intentional skips are documented:

21 Docker sandbox tests (no Docker in CI)
1 URL fetch real-network request
2 SSL certificate environment checks

I know exactly why each one is skipped. There are no "flaky tests we commented out."

What Gets Tested

The backend test suite covers 35 modules across 6 layers:

Integration tests hit actual route handlers with mocked databases. They test the full request/response cycle — authentication, validation, business logic, error handling.

Unit tests cover individual services: the FSRS scheduler, Hebbian dynamics, knowledge graph operations, RAG pipeline components, billing logic, memory consolidation.

Service tests mock external dependencies (Stripe, SendGrid, Anthropic API) but exercise the actual service logic. The billing service has 61 tests covering checkout, webhooks, credit deduction, and plan gating.

The frontend tests cover 8 React Query hook families, 15+ component behaviors, and 3 complex UI flows (chat streaming, idea management, settings persistence).

The Philosophy Behind the Numbers

I follow a simple rule: every PR that ships code must ship tests.

Not "write tests when you have time." Not "we'll add tests later." Every feature, every route, every service. Always.

This sounds obvious. Most developers agree with it in principle. Most don't practice it. There's always a reason: the deadline, the prototype, the "we'll refactor this anyway."

Those reasons accumulate into a codebase where you're afraid to change anything.

The Specific Patterns That Made It Work

1. Test the contract, not the implementation

I test what a function promises, not how it does it. If getSubscription(userId) should return a Subscription object with a plan field, that's the test. Not that it calls db.query with a specific SQL string.

This means tests survive refactoring. When I migrated from one ORM to raw queries, zero tests broke.

2. Mock at the boundary, not inside

External services (database, Stripe, Anthropic) are mocked at the module boundary. Everything else runs real. This catches logic errors without requiring real infrastructure.

3. The 5-test rule

For any non-trivial route or service: happy path, missing auth, invalid input, database error, edge case. Five tests, 10 minutes. The discipline of always writing these five catches 80% of real bugs.

4. Tests as documentation

The test names tell you what the system does. describes('POST /api/:context/tasks').it('creates task with dependency tracking') is better documentation than a README that goes stale.

The Phase 97 Turning Point

Around phase 97, I ran a deep quality audit: 59 fixes across 12 areas. Route coverage went from 38% to 98% in one sprint.

The insight was simple: untested code is a liability. Not a future problem. A present one. Every untested path is a behavior you can't reason about, a change you can't make safely, a bug you'll find in production instead of in your editor.

After that audit, test coverage became the metric I tracked most carefully. Not lines of code. Not features shipped. Tests passing.

What This Enabled

Here's a concrete example: the Phase 144 PR added Twitter OAuth, LinkedIn integration, governance flow, metrics workers, and a BullMQ scheduler — 15 files, 94 tests — in a single session.

That's possible because:

The surrounding code had 95%+ coverage
I could add the new module knowing exactly what interface it needed
The tests for the new code were written alongside the implementation
The CI pipeline caught two integration bugs before I even reviewed the PR

The tests paid for themselves in that single PR.

The Honest Cost

Writing tests takes time. On average, I spend 30-40% of implementation time on tests.

For a funded team on a deadline, that might feel like a luxury. For a solo developer building something meant to last, it's the only sane approach.

The math: 30% extra time upfront eliminates at least 5x that time in debugging, regression hunting, and fear-driven rewrites. I've watched funded teams with 10x the headcount ship half the features per week because their codebase had become fragile.

There's no hack around this. You either invest in tests or you pay the compound interest of technical debt.

The Setup

For anyone who wants to replicate this approach:

Backend (Jest + TypeScript):

cd backend && npm test                    # All 7,720 tests
cd backend && npm test -- --testPathPatterns="billing"  # Single suite
cd backend && npm test -- --coverage      # With coverage report

Frontend (Vitest):

cd frontend && npx vitest run             # All 1,400 tests

CI (GitHub Actions): 5 shards, SKIP_EXTERNAL_SERVICES=true, runs in ~45 seconds.

The test runner is the first thing I open every morning. Green means the previous day's work is solid. Red means I know exactly what to fix before starting anything new.

The Takeaway

If you're building something you intend to maintain for more than 6 months, the question isn't whether to write comprehensive tests. It's how to build the habit of writing them.

Start with the 5-test rule. Test the contract, not the implementation. Mock at the boundary. Make it so that shipping without tests feels wrong.

Nine months and 9,200 tests later: it's the habit I'm most glad I built.

ZenAI is open-source at github.com/Alexander-Bering/KI-AB. ZenBrain, the extracted memory system, is on npm as @zensation/algorithms and @zensation/core.

Why We Wrote 11,589 Tests for a Solo Project

Tests Aren't About QA. They're About Velocity.

The Composition

What Gets Tested

The Philosophy Behind the Numbers

The Specific Patterns That Made It Work

The Phase 97 Turning Point

What This Enabled

The Honest Cost

The Setup

The Takeaway

Related Articles

From AI Overview to a Real Demo: Turning a Blueprint into a ZenAi Instance — Autonomously

The Stress Test: Why We Ran Ten Security Sprints Before the First Customer

91 % of the accuracy at 1 % of the tokens — the Pareto position for AI memory

Why We Wrote 11,589 Tests for a Solo Project

Tests Aren't About QA. They're About Velocity.

The Composition

What Gets Tested

The Philosophy Behind the Numbers

The Specific Patterns That Made It Work

The Phase 97 Turning Point

What This Enabled

The Honest Cost

The Setup

The Takeaway

Related Articles

From AI Overview to a Real Demo: Turning a Blueprint into a ZenAi Instance — Autonomously

The Stress Test: Why We Ran Ten Security Sprints Before the First Customer

91 % of the accuracy at 1 % of the tokens — the Pareto position for AI memory