Storage & Op-Log

How OMGDB persists data as a human-readable append-only NDJSON operation log, the framed record format, and the single-file .omgdb transport archive.

OMGDB has one canonical source of truth on disk: an append-only operation log named oplog.ndjson. Every mutation is recorded as a framed, CRC-protected line of canonical JSON, and the entire logical state — documents, collection validation rules, and secondary indexes — is rebuilt in memory by replaying that log when the store is opened. There is no separate binary table file or page cache that holds authoritative data; the log is the database.

A live store is a directory bundle, not a single file. The directory contains oplog.ndjson (and, when used, change-audit sidecars and an advisory lock file). The single-file .omgdb form you may have seen is a transport/archive format produced by omgdb pack — it is not the live storage engine. This page documents the on-disk record format, the durability model, and the pack/unpack archive. For how operations are grouped atomically, see transactions; for how documents and _id are modeled, see the data model.

The op-log file

Each store directory holds exactly one log file:

app.omgdb/
  oplog.ndjson      # the canonical, append-only operation log

The log is plain newline-delimited text. You can read it with any tool — including cat — and every line is self-describing:

cat app.omgdb/oplog.ndjson

Because the log fully determines state, derived structures (secondary indexes, vector records, caches) hold zero authoritative bits. This is invariant I1 (text completeness): replaying oplog.ndjson on open reconstructs the entire logical state.

Framed record format

Each physical line is framed as:

<canonical-json>\t<crc32-hex>\n

That is, a single canonical-JSON object, a literal ASCII tab, the CRC-32 of the JSON bytes as 8 lowercase hexadecimal digits (zero-padded, {crc:08x}), then a newline. Canonical JSON escapes tabs inside strings, so the framing tab can never collide with record content.

An example line (an insert of a one-field document keyed by integer 1):

{"lsn":0,"ts":{"$date":0},"op":"insert","ns":"c","id":1,"doc":{"_id":1,"v":"row-1"}}	a1b2c3d4

Note: The 8-hex-digit suffix after the tab is the CRC-32 of everything before the tab. The value shown above is illustrative; the real CRC is computed from the exact canonical-JSON bytes of that line.

Record fields

Every record serializes a common header plus op-specific fields:

Field	Type	Description
`lsn`	integer (>= 0)	Log sequence number. Dense and contiguous — each record’s `lsn` must equal its position in the file.
`ts`	datetime (ms since Unix epoch)	Wall-clock time of the append. Serialized as a `$date` value; also accepted as a plain integer on read.
`txn`	integer (>= 0), optional	The owning transaction id. Omitted for auto-committed (single) operations. See transactions.
`op`	string	The operation discriminant (see below).
`ns`	string	The collection (namespace). Present on data, `define`, and `create_index` ops.
`id`	any	The document `_id`. Present on `insert`, `replace`, and `delete`.
`doc`	object	The full document including its `_id`. Present on `insert` and `replace`.
`spec`	object	A collection validation spec. Present on `define`.
`field`	string	The indexed field name. Present on `create_index`.

Op kinds

The op string takes one of eight tokens:

`op`	Fields	Meaning
`insert`	`ns`, `id`, `doc`	Add a new document.
`replace`	`ns`, `id`, `doc`	Replace an existing document by `_id` (an edit).
`delete`	`ns`, `id`	Remove a document — written as a tombstone record, not by erasing earlier bytes.
`define`	`ns`, `spec`	Define or redefine a collection’s validation rules. See schema validation.
`create_index`	`ns`, `field`	Create a secondary index on `ns.field`. See indexes.
`begin`	—	Start of a multi-operation transaction.
`commit`	—	Commit a transaction; its buffered ops become visible on replay.
`abort`	—	Discard a transaction’s ops on replay.

Note: An edit is recorded as an append, never an in-place rewrite. A delete appends a tombstone and a replace appends a new full document; the superseded records remain in the file until compaction rewrites the log to its minimal form.

Transaction markers on disk

The store itself never writes an abort record. It aborts a transaction by writing nothing: a dangling begin with no matching commit (for example, after a crash mid-transaction) is dropped entirely on replay. The explicit abort token is only produced by external log producers, but replay honours it for completeness. Atomicity, grouping, and isolation are covered in transactions.

Durability model

The write path for every mutation is strictly ordered:

append framed record to oplog.ndjson  ->  flush + fsync  ->  apply in memory

The fsync happens before the in-memory state changes, so a crash can never leave committed memory ahead of the durable log. A transaction appends begin, its ops, and commit, fsyncs once, then applies all of them in memory together.

Per-record CRC-32

When the log is read, the CRC is recomputed over the JSON bytes and compared to the stored value. A complete (newline-terminated) line whose CRC fails to match is treated as corruption and the default open path stops (fail-stop):

// oplog.rs test crc_detects_corruption
let corrupted = fs::read_to_string(&path).unwrap().replacen('{', "[", 1);
fs::write(&path, corrupted).unwrap();
assert!(matches!(read_log(&path), Err(LogError::Corruption(_))));

LSNs are also integrity-checked: a record’s lsn must equal its expected dense position, so a gap or a repeat (for example, two writers each starting at lsn 0) is reported as corruption.

Torn-tail crash recovery

A record is durable only once its terminating newline reaches disk. On open, the bytes are split at the last newline; any trailing fragment after it is an incomplete crash-time write and is dropped, with the event flagged as a truncated tail.

// oplog.rs test torn_tail_is_skipped
let mut full = fs::read_to_string(&path).unwrap();
full.push_str("{\"lsn\":1,\"ts\":{\"$date\":0},\"op\":\"inse"); // crash mid-append
fs::write(&path, full).unwrap();
let replay = read_log(&path).unwrap();
assert_eq!(replay.records.len(), 1);
assert!(replay.truncated_tail);

This works even when the torn fragment is not valid UTF-8 — for example, a multi-byte character cut mid-encoding. Such a fragment in the unterminated tail is ignored. By contrast, invalid UTF-8 inside a complete (newline-terminated) record is genuine corruption and is fail-stop.

The writer also guards against splicing: if any write fails, it is poisoned and refuses all further appends and syncs. This prevents the next record’s bytes from being concatenated onto torn bytes and forming a complete-but-invalid line. The torn bytes remain an unterminated tail that replay safely skips; recovery is to reopen and replay the intact prefix.

Limitation: The strict open path is fail-stop on any complete corrupt record. Recovering the intact prefix of a damaged log is an opt-in operation, exposed via the repair workflow rather than silently performed on open.

Replay rebuilds everything

The whole logical state lives in memory and is rebuilt from scratch by folding the op-log on every open. This is the current “cache-off” model: there are no persistent binary indexes or caches yet, so startup cost scales with total log size until compaction shrinks it. Reopening the store reproduces the exact same logical state — invariant I2 (rebuild equivalence).

Limitation: Because the entire dataset is held in RAM and replayed on open, a store does not scale beyond available memory. A paged/on-disk binary store is planned (not yet implemented).

Integrity check

The integrity check (which backs the verify workflow) re-reads the on-disk log verifying every CRC, re-folds it, and asserts that the rebuilt data, catalog, and indexes equal the live in-memory state. It is the runtime proof of invariants I1 and I2:

omgdb verify app.omgdb

Compaction

Over time the log accumulates superseded records (overwritten documents, tombstones, aborted transactions). Compaction rewrites the log to its minimal canonical form: one define per collection spec, one create_index per index (keys sorted), and one insert per surviving document in _id order.

omgdb compact app.omgdb

Compaction is crash-safe. The minimal log is written to a temporary oplog.ndjson.compacting file and fsynced, then read back with full CRC verification and a record-count check (rejecting any truncated tail). Only then is it atomically renamed over oplog.ndjson. The original file stays intact until the rename, so a failed rename is recoverable by reopening the original. An orphaned .compacting temp left by a crash is removed on the next open. The deterministic minimal form is invariant I3 (export stability): a replay of a compacted log yields identical state.

The single-file `.omgdb` archive

A live store is a directory, but you often want to move, copy, or attach it as one file. omgdb pack bundles the store directory into a single .omgdb archive, and omgdb unpack restores it into a fresh directory.

# Bundle a live store directory into one file
omgdb pack app.omgdb app.omgdb.pack

# Restore it into a new, empty store directory
omgdb unpack app.omgdb.pack restored.omgdb

Note: The .omgdb archive is the transport/archive form, not the live storage engine. The engine always runs against a store directory; you unpack an archive back into a directory before opening it.

Archive format

The archive is a tiny, dependency-free, legible format you can also cat. It begins with a magic header line (OMGDB-PACK v1) followed by one or more entries:

OMGDB-PACK v1
FILE <relative-path> <byte-len>\n<raw bytes>\n

pack bundles the canonical oplog.ndjson plus the change-audit sidecars (pending/, changes/) in deterministic, sorted order. The transient advisory LOCK file and any scratch files are intentionally left out. Because the op-log fully determines state (I1), an unpacked store replays to exactly the packed one.

unpack refuses to write into a directory that already contains a store (it checks for an existing oplog.ndjson), so it never clobbers live data. It also rejects unsafe entry paths — absolute paths and any ../root components — so a crafted archive cannot escape the destination directory.

Single-process exclusive lock

Opening a store acquires an exclusive advisory lock via a LOCK file in the store directory. While the first process holds the store open, a second open of the same directory is cleanly refused:

// store.rs test second_open_is_locked_out
let first = Store::open(&dir).unwrap();
assert!(matches!(Store::open(&dir), Err(StoreError::Locked { .. })));
drop(first);
let _reopened = Store::open(&dir).unwrap(); // available again after the first is dropped

The lock is held for the lifetime of the open store and released when it is dropped.

Limitation: Concurrency is single-process and single-writer only — the advisory lock plus serialized mutation. There is no multi-reader/multi-writer model; use one process at a time. On POSIX filesystems the store directory is also fsynced after open and after a compaction rename so new directory entries survive a crash; directory fsync is a deliberate no-op on Windows.