This page is kept honest: “shipped” means merged, tested, and covered by the benchmark gate; “next” lists the real reason each item is not done yet. Dated summaries of each pass live in the changelog.
Recently shipped
The 2026-07 performance pass:
- Much faster writes. Bulk import commits in batches with one fsync each — 50,000 documents load in about 2 seconds — and single durable writes now run at the fsync wall, the same physical wall every embedded database hits.
- Deferred cache persistence. Derived cache sidecars are no longer rewritten on every operation; checkpoints may lag the durable log by a bounded tail, and readers heal them lazily. Persistence is an optimization, never a correctness event.
describeshows the write contract. The live manual now includes each collection’s indexes and validation rules, so an agent knows what a write must satisfy before attempting it.- Incremental vector sync.
vsyncskips embeddings that are already fresh, andvsearch/contextreuse persisted vectors instead of re-embedding on every call. - MCP contract hardening. Unknown tool names report “unknown tool”, destructive tools carry real destructive hints, and every tool declares its required parameters in the schema.
- Structured missing-collection errors.
explainanddiagnosefail loudly on a missing collection with a did-you-mean hint, instead of silently returning nothing. - Compaction preserves history. Compacting the op-log keeps each surviving record’s original timestamp, so compaction no longer rewrites when things happened.
Next
Each item was deferred for a real reason, listed with it:
- Warm-open fast path. Opening from a checkpoint is benchmarked but not yet measurably faster than replaying the log — it ships when it actually wins.
- Paged caches, then a paged store. Every fresh process currently decodes whole cache sidecars before its first read; paging or memory-mapping them is the stepping stone to datasets bigger than RAM — the main gap to production scale.
- Uniform
--jsonoutput flag. Today onlyinspect --jsonexists; agents parse text for the rest of the CLI. - A real neural embedder (ONNX). The bundled embedder is a deterministic baseline; a neural drop-in needs a native-dependency and model-distribution story that doesn’t compromise the dependency-light build.
- TypeScript binding. A large new surface over the CLI/JSON layer; worth doing once, properly, with real Node tooling.
- HNSW vector index. Flat exact kNN is correct and fine at current scale; an approximate index must stay a derived, rebuildable artifact like every other binary file.
- Log-native rollback. Undo expressed in the op-log itself, rather than only in sidecar change files.
- Splitting the two largest source files. The store and query crates have grown monolithic files; splitting them is a dedicated refactor, deliberately not mixed with behavior changes.
Principles
- The text stays canonical. The NDJSON op-log is the single source of truth. Binary artifacts — caches, vector indexes, future paged checkpoints — are allowed only as derived, deletable, rebuildable files that
verifycan check at runtime. - Honest limitations. Every feature ships with a plain statement of its current limit, and this site never claims past it. If the docs and the engine disagree, that’s a bug in the docs.
- Every claim is verified. Behavior is proven by tests, and performance by a benchmark harness with a regression gate — a reintroduced hot-path cost fails loudly. A number you read here was measured, not estimated.