Roadmap

What recently shipped in OMGDB, what comes next, and the principles that decide the order.

This page is kept honest: “shipped” means merged, tested, and covered by the benchmark gate; “next” lists the real reason each item is not done yet. Dated summaries of each pass live in the changelog.

Recently shipped

The 2026-07 performance pass:

Much faster writes. Bulk import commits in batches with one fsync each — 50,000 documents load in about 2 seconds — and single durable writes now run at the fsync wall, the same physical wall every embedded database hits.
Deferred cache persistence. Derived cache sidecars are no longer rewritten on every operation; checkpoints may lag the durable log by a bounded tail, and readers heal them lazily. Persistence is an optimization, never a correctness event.
describe shows the write contract. The live manual now includes each collection’s indexes and validation rules, so an agent knows what a write must satisfy before attempting it.
Incremental vector sync. vsync skips embeddings that are already fresh, and vsearch/context reuse persisted vectors instead of re-embedding on every call.
MCP contract hardening. Unknown tool names report “unknown tool”, destructive tools carry real destructive hints, and every tool declares its required parameters in the schema.
Structured missing-collection errors. explain and diagnose fail loudly on a missing collection with a did-you-mean hint, instead of silently returning nothing.
Compaction preserves history. Compacting the op-log keeps each surviving record’s original timestamp, so compaction no longer rewrites when things happened.

Each item was deferred for a real reason, listed with it:

Warm-open fast path. Opening from a checkpoint is benchmarked but not yet measurably faster than replaying the log — it ships when it actually wins.
Paged caches, then a paged store. Every fresh process currently decodes whole cache sidecars before its first read; paging or memory-mapping them is the stepping stone to datasets bigger than RAM — the main gap to production scale.
Uniform --json output flag. Today only inspect --json exists; agents parse text for the rest of the CLI.
A real neural embedder (ONNX). The bundled embedder is a deterministic baseline; a neural drop-in needs a native-dependency and model-distribution story that doesn’t compromise the dependency-light build.
TypeScript binding. A large new surface over the CLI/JSON layer; worth doing once, properly, with real Node tooling.
HNSW vector index. Flat exact kNN is correct and fine at current scale; an approximate index must stay a derived, rebuildable artifact like every other binary file.
Log-native rollback. Undo expressed in the op-log itself, rather than only in sidecar change files.
Splitting the two largest source files. The store and query crates have grown monolithic files; splitting them is a dedicated refactor, deliberately not mixed with behavior changes.

Principles

The text stays canonical. The NDJSON op-log is the single source of truth. Binary artifacts — caches, vector indexes, future paged checkpoints — are allowed only as derived, deletable, rebuildable files that verify can check at runtime.
Honest limitations. Every feature ships with a plain statement of its current limit, and this site never claims past it. If the docs and the engine disagree, that’s a bug in the docs.
Every claim is verified. Behavior is proven by tests, and performance by a benchmark harness with a regression gate — a reintroduced hot-path cost fails loudly. A number you read here was measured, not estimated.

Recently shipped

Next

Principles