OMGDB DOCS
// AI-native

Native Markdown

Import a Markdown file as one queryable document — frontmatter becomes typed top-level fields, headings become a section list, and the raw body is stored alongside.


OMGDB treats Markdown as a first-class input. A .md file — a spec, a note, a piece of documentation — can be imported as a single document where the YAML-ish frontmatter becomes typed, queryable top-level fields and the heading structure is captured under _sections. The raw Markdown body is stored too, so nothing is lost. This is the knowledge layer an agent reads from: notes and specs land as ordinary documents you can query by their frontmatter.

Import is exposed through the CLI command omgdb import-md. Internally it parses the file, builds a document, inserts it, and prints the new _id.

Importing a file

omgdb create app.omgdb
omgdb import-md app.omgdb docs spec.md
# -> prints the new document's _id as canonical JSON

The command takes three positional arguments: the store path, the target collection, and the path to the Markdown file. The whole file becomes exactly one document in the named collection.

ArgumentDescription
<path>Store path, e.g. app.omgdb.
<collection>Collection the document is inserted into.
<file>Path to the .md file to parse and import.

The document shape

A parsed Markdown file is converted into a document with three parts:

  1. Frontmatter fields — each key: value line from the frontmatter block is lifted to a top-level field, typed where possible.
  2. body — the raw Markdown after the frontmatter, stored verbatim as a string.
  3. _sections — an array of { level, heading, text } objects, one per ATX heading, in document order.

Frontmatter parsing

Frontmatter is recognized only when the file begins with a literal --- fence line and a matching closing --- fence follows. Inside the block, each non-empty line is split on its first : into a key and a value. The value is parsed as JSON when possible and otherwise kept as a string:

  • views: 10 becomes an integer (I64).
  • tags: ["rag", "db"] becomes an array.
  • title: Hello World stays a string (unquoted prose is not valid JSON, so it falls back to a string).

Limitation: Frontmatter is a minimal line-based subset, not a full YAML parser. It splits on the first : per line and parses each value as JSON-or-string. Block scalars, nested maps, and multi-line values are not supported. A file with no --- fences yields no frontmatter fields, and the entire text becomes body.

Sections

Every ATX heading (# through ######) starts a new section. A line is treated as a heading only when, after trimming leading whitespace, it begins with 1–6 # characters immediately followed by a single space. Each section records its heading level, the heading text (with the leading #s stripped), and the text body up to the next heading.

Note: Despite the “section tree” phrasing, _sections is a flat, ordered array. Heading levels are recorded in the level field, but no parent/child nesting is built. Any content before the first heading is not captured in a section (it still lives in body).

Example

Given this spec.md:

---
title: Hello World
views: 10
tags: ["rag", "db"]
---
# Intro

the intro text

## Details

more text

Importing it produces a document of this shape:

{
  "title": "Hello World",
  "views": 10,
  "tags": ["rag", "db"],
  "body": "# Intro\n\nthe intro text\n\n## Details\n\nmore text\n",
  "_sections": [
    { "level": 1, "heading": "Intro",   "text": "the intro text\n" },
    { "level": 2, "heading": "Details", "text": "more text" }
  ]
}

Note that title came through as a string, views as a number, and tags as an array — each frontmatter line was typed according to whether its value parsed as JSON.

Querying frontmatter

Because frontmatter fields are stored as ordinary top-level fields, they are queryable like any other field — no special syntax. After import you can find on a frontmatter value:

omgdb find app.omgdb docs {"title":"Spec"}
# result contains "Intro" and "_sections"

The same applies to the typed fields. A numeric frontmatter field works with the comparison operators, and an array field works with array operators:

omgdb find app.omgdb docs {"views":{"$gte":10}}
omgdb find app.omgdb docs {"tags":"rag"}

The _sections array travels with the document, so the heading structure is available to read back, and the raw body is always present for full-text or downstream processing.

Limitations and planned work

The importer captures structure and types, but a few capabilities are explicitly follow-ups rather than current behavior:

Limitation: A byte-offset source map for sections is planned (not yet implemented). Each Section stores only level, heading, and text — there are no source byte offsets, so a section cannot be mapped back to its exact position in the original file.

Limitation: Addressable section edits (setSection / patchFrontmatter) are planned (not yet implemented). _sections is a plain array snapshot with no stable per-section identity, so there is no API to surgically rewrite a single heading-delimited section in place.

Note: Heading-aware chunking into embeddings is not part of import. Embedding-based retrieval over sections is provided by the vector layer — see vector search.

See also

  • Data model — how documents, types, and field paths work, including the conventions imported Markdown follows.
  • Vector search — embedding and retrieval over imported content, including section-level chunking.
  • Query operators — filtering on the frontmatter fields produced by import.

Edit this page on GitHub →