Recipes — OMGDB Docs

Three end-to-end walkthroughs — bulk-load 50,000 documents, build a searchable knowledge base for your agent, and make a bulk change you can undo.

Three complete tasks, start to finish, using only commands the omgdb binary actually implements. Each recipe stands alone: run it in an empty directory and every step works as shown. If you haven’t installed OMGDB yet, start with install & build.

Note: As everywhere in the CLI, documents, filters, and updates are passed as JSON strings — on a POSIX shell, single-quote them so the shell does not eat the inner double quotes.

Load 50,000 documents in about two seconds

import-jsonl streams a JSON Lines file into a collection in atomic batches — each batch is one transaction and one fsync. That amortization is the whole trick: durable writes are fsync-bound, so sharing one fsync across thousands of documents is how a bulk load goes fast.

First, generate a 50,000-line JSONL file (any tool works; here seq + awk):

seq 1 50000 | awk '{ printf "{\"n\":%d,\"name\":\"user-%d\",\"group\":%d}\n", $1, $1, $1 % 10 }' > users.jsonl

Create a store and import the file. --batch-size sets how many documents share each atomic commit (default 1000):

omgdb create app.omgdb
# created store at app.omgdb

omgdb import-jsonl app.omgdb users users.jsonl --batch-size 5000
# {"inserted":50000,"batches":10}

On a development machine this takes about 2 seconds. Every line must be a JSON object; a malformed line fails with its line number, and only complete batches are committed — a failed import never leaves a half-applied batch.

Now prove the store is consistent. verify re-reads the op-log and checks that replaying it reproduces the live state, including every derived cache — the whole proof runs in about a second at this size:

omgdb verify app.omgdb
# OK: 50020 record(s), 50000 document(s) in 1 collection(s); log reproduces state

The record count is honest bookkeeping: 50,000 inserts plus a begin/commit marker pair for each of the 10 batches. From here, add an index and query as usual:

omgdb create-index app.omgdb users group
# created index on `users.group`

omgdb find app.omgdb users '{"group":7}' --limit 2
# {"_id":{"$oid":"..."},"n":7,"name":"user-7","group":7}
# {"_id":{"$oid":"..."},"n":17,"name":"user-17","group":7}

See indexes for what the planner can and cannot accelerate.

Give your agent a searchable knowledge base

A folder of Markdown notes becomes a queryable, semantically searchable collection your coding agent can retrieve from — locally, with no embedding service. Frontmatter becomes queryable fields, headings become a section tree under _sections, and the full text lands in the body field.

Import the folder with a shell loop (import-md takes one file at a time and prints each new document’s _id):

omgdb create kb.omgdb

for f in notes/*.md; do omgdb import-md kb.omgdb notes "$f"; done
# {"$oid":"018f...a1"}
# {"$oid":"018f...a2"}
# ...

Persist embeddings for the body field. vsync is incremental: it embeds only documents whose stored vector is missing or stale, so re-running it after edits is cheap, and searches reuse the persisted vectors instead of re-embedding:

omgdb vsync kb.omgdb notes body
# synced 42 embedding(s) into `notes.__vectors` (fresh ones skipped)

Search semantically, and combine with a structured pre-filter for hybrid search. If your notes carry frontmatter like tags: ["ops"], array-contains semantics make the filter natural:

omgdb vsearch kb.omgdb notes body "how do we deploy to production" --k 3 --filter '{"tags":"ops"}'
# 0.6412	{"_id":{"$oid":"..."},"title":"Deploy runbook","tags":["ops"],"body":"...","_sections":[...]}
# 0.4108	{"_id":{"$oid":"..."},"title":"Release checklist","tags":["ops"],...}
# ...

For handing context to the agent, build a token-budgeted pack instead: the most relevant chunks, best-first, cut off at the budget, each carrying its source _id as a citation:

omgdb context kb.omgdb notes body "draft the deploy runbook" --budget 800 --filter '{"tags":"ops"}'
# {"query":"draft the deploy runbook","budgetTokens":800,"usedTokens":763,"truncated":true,
#  "chunks":[{"id":{"$oid":"..."},"score":0.6412,"tokens":512,"text":"..."},...],
#  "citations":[{"$oid":"..."},...]}

To let the agent drive this itself, point an MCP server at the store — vsearch and context_pack are available even at read-only scope:

omgdb mcp --scope read

Note: The bundled embedder is a deterministic offline baseline, not a neural model, so relevance is approximate. See vector search and context packs for the honest details.

Make a bulk change you can undo

Bulk updates are where a bad filter does the most damage, so OMGDB splits them into plan → apply → rollback: a dry run you can inspect, an atomic commit by token, and an undo that restores the recorded before-state. Full background in agent-safe mutations.

Seed a collection where each document holds an array of tasks:

omgdb create app.omgdb
omgdb insert app.omgdb projects '{"project":"atlas","tasks":[{"name":"write spec","status":"todo"},{"name":"review","status":"done"}]}'
omgdb insert app.omgdb projects '{"project":"atlas","tasks":[{"name":"ship","status":"todo"}]}'

Plan the change: mark every todo task as done across the whole project. The $[t] placeholder in the update path is bound by --array-filters, so only matching array elements are touched. Nothing is written — this is a dry run:

omgdb plan-update app.omgdb projects '{"project":"atlas"}' \
  '{"$set":{"tasks.$[t].status":"done"}}' \
  --array-filters '[{"t.status":"todo"}]'
# {"token":"<token>","matched":2,"sampleBeforeAfter":[
#   {"before":{...,"tasks":[{"name":"write spec","status":"todo"},...]},
#    "after":{...,"tasks":[{"name":"write spec","status":"done"},...]}},
#   ...]}

Before applying, read the plan itself. It lives under pending/ as plain NDJSON — one line per matched document, holding both the original and the proposed version:

cat app.omgdb/pending/<token>.ndjson
# {"ns":"projects","id":{"$oid":"..."},"before":{...},"after":{...}}
# {"ns":"projects","id":{"$oid":"..."},"before":{...},"after":{...}}

Note the review task stays done-as-it-was: the array filter only rewrote elements whose status was todo. Apply the plan by its token — one transaction, all-or-nothing:

omgdb apply app.omgdb <token>
# {"changeId":"<token>","applied":2}

On success the plan file moves from pending/ to changes/, where it becomes the change journal: a readable NDJSON record of exactly what changed, before and after, for every touched document. That journal is what makes the undo trustworthy — you can cat precisely what a rollback will restore:

cat app.omgdb/changes/<token>.ndjson
# the same {ns, id, before, after} lines, now an audit-and-undo record

omgdb rollback app.omgdb <token>
# rolled back 2 document(s)

Every task is back to its recorded before-state. The same three steps are exposed to agents as MCP tools (plan_update, apply, rollback) at read-write scope, which is what makes the pattern practical: an agent plans, a human (or a stricter agent) reviews the sample and the count, and only then does anyone apply. See agent-safe mutations and update operators.

Where to next

Quickstart — the guided tour of every core command.
CLI reference — every command and flag in one place.
MCP server — hand the store to an agent with enforced capability scopes.
Storage & op-log — why all of the above is rebuildable from one text file.