---
title: Recipes
description: Three end-to-end walkthroughs — bulk-load 50,000 documents, build a searchable knowledge base for your agent, and make a bulk change you can undo.
---

Three complete tasks, start to finish, using only commands the `omgdb` binary actually implements. Each recipe stands alone: run it in an empty directory and every step works as shown. If you haven't installed OMGDB yet, start with [install & build](/docs/install/).

> **Note:** As everywhere in the CLI, documents, filters, and updates are passed as JSON **strings** — on a POSIX shell, single-quote them so the shell does not eat the inner double quotes.

## Load 50,000 documents in about two seconds

`import-jsonl` streams a JSON Lines file into a collection in atomic batches — each batch is one transaction and one fsync. That amortization is the whole trick: durable writes are fsync-bound, so sharing one fsync across thousands of documents is how a bulk load goes fast.

First, generate a 50,000-line JSONL file (any tool works; here `seq` + `awk`):

```sh
seq 1 50000 | awk '{ printf "{\"n\":%d,\"name\":\"user-%d\",\"group\":%d}\n", $1, $1, $1 % 10 }' > users.jsonl
```

Create a store and import the file. `--batch-size` sets how many documents share each atomic commit (default 1000):

```sh
omgdb create app.omgdb
# created store at app.omgdb

omgdb import-jsonl app.omgdb users users.jsonl --batch-size 5000
# {"inserted":50000,"batches":10}
```

On a development machine this takes about 2 seconds. Every line must be a JSON object; a malformed line fails with its line number, and only complete batches are committed — a failed import never leaves a half-applied batch.

Now prove the store is consistent. `verify` re-reads the op-log and checks that replaying it reproduces the live state, including every derived cache — the whole proof runs in about a second at this size:

```sh
omgdb verify app.omgdb
# OK: 50020 record(s), 50000 document(s) in 1 collection(s); log reproduces state
```

The record count is honest bookkeeping: 50,000 inserts plus a `begin`/`commit` marker pair for each of the 10 batches. From here, add an index and query as usual:

```sh
omgdb create-index app.omgdb users group
# created index on `users.group`

omgdb find app.omgdb users '{"group":7}' --limit 2
# {"_id":{"$oid":"..."},"n":7,"name":"user-7","group":7}
# {"_id":{"$oid":"..."},"n":17,"name":"user-17","group":7}
```

See [indexes](/docs/indexes/) for what the planner can and cannot accelerate.

## Give your agent a searchable knowledge base

A folder of Markdown notes becomes a queryable, semantically searchable collection your coding agent can retrieve from — locally, with no embedding service. Frontmatter becomes queryable fields, headings become a section tree under `_sections`, and the full text lands in the `body` field.

Import the folder with a shell loop (`import-md` takes one file at a time and prints each new document's `_id`):

```sh
omgdb create kb.omgdb

for f in notes/*.md; do omgdb import-md kb.omgdb notes "$f"; done
# {"$oid":"018f...a1"}
# {"$oid":"018f...a2"}
# ...
```

Persist embeddings for the `body` field. `vsync` is incremental: it embeds only documents whose stored vector is missing or stale, so re-running it after edits is cheap, and searches reuse the persisted vectors instead of re-embedding:

```sh
omgdb vsync kb.omgdb notes body
# synced 42 embedding(s) into `notes.__vectors` (fresh ones skipped)
```

Search semantically, and combine with a structured pre-filter for hybrid search. If your notes carry frontmatter like `tags: ["ops"]`, array-contains semantics make the filter natural:

```sh
omgdb vsearch kb.omgdb notes body "how do we deploy to production" --k 3 --filter '{"tags":"ops"}'
# 0.6412	{"_id":{"$oid":"..."},"title":"Deploy runbook","tags":["ops"],"body":"...","_sections":[...]}
# 0.4108	{"_id":{"$oid":"..."},"title":"Release checklist","tags":["ops"],...}
# ...
```

For handing context to the agent, build a token-budgeted pack instead: the most relevant chunks, best-first, cut off at the budget, each carrying its source `_id` as a citation:

```sh
omgdb context kb.omgdb notes body "draft the deploy runbook" --budget 800 --filter '{"tags":"ops"}'
# {"query":"draft the deploy runbook","budgetTokens":800,"usedTokens":763,"truncated":true,
#  "chunks":[{"id":{"$oid":"..."},"score":0.6412,"tokens":512,"text":"..."},...],
#  "citations":[{"$oid":"..."},...]}
```

To let the agent drive this itself, point an MCP server at the store — `vsearch` and `context_pack` are available even at read-only scope:

```sh
omgdb mcp --scope read
```

> **Note:** The bundled embedder is a deterministic offline baseline, not a neural model, so relevance is approximate. See [vector search](/docs/vector-search/) and [context packs](/docs/context-packs/) for the honest details.

## Make a bulk change you can undo

Bulk updates are where a bad filter does the most damage, so OMGDB splits them into **plan → apply → rollback**: a dry run you can inspect, an atomic commit by token, and an undo that restores the recorded before-state. Full background in [agent-safe mutations](/docs/agent-mutations/).

Seed a collection where each document holds an array of tasks:

```sh
omgdb create app.omgdb
omgdb insert app.omgdb projects '{"project":"atlas","tasks":[{"name":"write spec","status":"todo"},{"name":"review","status":"done"}]}'
omgdb insert app.omgdb projects '{"project":"atlas","tasks":[{"name":"ship","status":"todo"}]}'
```

Plan the change: mark every `todo` task as `done` across the whole project. The `$[t]` placeholder in the update path is bound by `--array-filters`, so only matching array elements are touched. **Nothing is written** — this is a dry run:

```sh
omgdb plan-update app.omgdb projects '{"project":"atlas"}' \
  '{"$set":{"tasks.$[t].status":"done"}}' \
  --array-filters '[{"t.status":"todo"}]'
# {"token":"<token>","matched":2,"sampleBeforeAfter":[
#   {"before":{...,"tasks":[{"name":"write spec","status":"todo"},...]},
#    "after":{...,"tasks":[{"name":"write spec","status":"done"},...]}},
#   ...]}
```

Before applying, read the plan itself. It lives under `pending/` as plain NDJSON — one line per matched document, holding both the original and the proposed version:

```sh
cat app.omgdb/pending/<token>.ndjson
# {"ns":"projects","id":{"$oid":"..."},"before":{...},"after":{...}}
# {"ns":"projects","id":{"$oid":"..."},"before":{...},"after":{...}}
```

Note the `review` task stays `done`-as-it-was: the array filter only rewrote elements whose `status` was `todo`. Apply the plan by its token — one transaction, all-or-nothing:

```sh
omgdb apply app.omgdb <token>
# {"changeId":"<token>","applied":2}
```

On success the plan file moves from `pending/` to `changes/`, where it becomes the change journal: a readable NDJSON record of exactly what changed, before and after, for every touched document. That journal is what makes the undo trustworthy — you can `cat` precisely what a rollback will restore:

```sh
cat app.omgdb/changes/<token>.ndjson
# the same {ns, id, before, after} lines, now an audit-and-undo record

omgdb rollback app.omgdb <token>
# rolled back 2 document(s)
```

Every task is back to its recorded before-state. The same three steps are exposed to agents as MCP tools (`plan_update`, `apply`, `rollback`) at read-write scope, which is what makes the pattern practical: an agent plans, a human (or a stricter agent) reviews the sample and the count, and only then does anyone apply. See [agent-safe mutations](/docs/agent-mutations/) and [update operators](/docs/update-operators/).

## Where to next

- [Quickstart](/docs/quickstart/) — the guided tour of every core command.
- [CLI reference](/docs/cli/) — every command and flag in one place.
- [MCP server](/docs/mcp/) — hand the store to an agent with enforced capability scopes.
- [Storage & op-log](/docs/storage/) — why all of the above is rebuildable from one text file.
