The database is the persistence layer for kiro-learn. Every event the Collector ingests and every memory record Extraction produces ends up here. It is a single SQLite file on disk, accessed through a small, well-defined interface. Nothing ever leaves your machine from this layer. The database file lives at:Documentation Index
Fetch the complete documentation index at: https://kiro-learn.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
What it stores
Two things, in two tables:| Table | What it stores | Written by |
|---|---|---|
events | Raw events exactly as the shims posted them (after cleaning) | Ingestion pipeline |
memory_records | Structured memories extracted from batches of events | Extraction worker |
| Table | What it stores |
|---|---|
memory_records_fts | Full-text search index over memory record titles, summaries, and facts |
_migrations | One row per applied schema migration — the bookkeeping the runner uses to detect drift |
The events table
Every incoming event lands here as one row. The schema mirrors the wire format — event ID, session ID, actor ID, namespace, kind, body, timestamps — with the body and source block stored as JSON blobs. The primary key is event_id (a ULID), so duplicate POSTs from a retrying shim are a no-op instead of an error.
The table is indexed for the two common access patterns: listing events for a project in reverse chronological order, and looking up an event by its session or parent.
Two timestamp columns track different notions of “when”:
valid_time— when the event actually happened (set by the shim).transaction_time— when the database recorded it (stamped on insert).
The memory_records table
Each row is one memory record: a title, a summary, a list of facts, a list of concepts, a list of files touched, an observation type, and the IDs of the source events that produced it. The source_event_ids field links every memory back to the raw events it came from, so you can always trace a memory to its provenance.
The primary key is record_id (mr_ + ULID). Unlike events, memory records are never supposed to collide — a duplicate record ID is an upstream bug, so the database rejects the insert loudly instead of silently ignoring it.
STRICT tables
Both user-facing tables are declared STRICT. This is SQLite’s opt-in type-safety mode. In a non-strict table, inserting a string into an INTEGER column silently succeeds and stores the string. In a strict table, the same insert fails loudly.
For a persistence layer that serializes wire data validated by Zod on the way in, silent type coercion is a liability, not a feature. STRICT closes that gap — if a code change accidentally passes a number where a string is expected, the database catches it at insert time instead of letting a malformed row drift into storage.
Full-text search
Memory records need to be searchable by content. Retrieval pulls relevant memories into the agent’s prompt, and the match quality directly shapes how useful the feature is. kiro-learn uses SQLite’s built-in FTS5 full-text search extension. When a memory record is inserted intomemory_records, a companion row is also inserted into the memory_records_fts virtual table. Both inserts happen inside a single transaction — either both rows land or neither does.
The FTS row indexes three fields:
titlesummaryfacts_text— the record’s facts joined into a single searchable blob
record_id and namespace are carried as UNINDEXED columns so the search path can join back to the primary table and filter results by project without a separate lookup.
Tokenizer
The FTS5 index uses a three-stage tokenizer:porter unicode61 remove_diacritics 2. Each stage does one thing:
| Stage | What it does |
|---|---|
porter | Stems tokens to their root form — “migrating”, “migrates”, “migration” all collapse to “migrat” |
unicode61 | Unicode-aware word segmentation and case folding |
remove_diacritics 2 | Strips diacritical marks so “café” matches “cafe” |
Availability over ranking
Retrieval builds a sanitized FTS5 query by tokenizing the prompt and emitting anOR of quoted phrases. If FTS5 rejects the query anyway — its grammar has corners that are hard to fully account for — the storage layer catches the error and falls back to a LIKE '%query%' search against the original string. The LIKE path is slower and unranked (results come back by creation date), but it always works. The explicit tradeoff is availability over ranking quality: a worse-ordered result set is better than an error.
Migrations
The schema evolves over time. New features need new columns, new indexes, or new constraints. Rather than handwriting upgrade scripts, kiro-learn uses a small migration runner that applies code-embedded DDL in order and tracks what has been applied. Rules the runner enforces:- Append-only. New migrations are added at the end with the next integer version. Released migrations are never reordered, renamed, or edited in place.
- Transactional. Each migration runs inside a
BEGIN/COMMIT. If the DDL throws, the transaction rolls back and_migrationsis left untouched — the next run tries again from the same point. - Idempotent. Running the runner twice with the same list is a no-op. After every migration is applied, subsequent opens of the database do no DDL work.
- Drift-fatal. If
_migrationsrecords a version whose name disagrees with the code, the runner throwsMigrationDriftErrorand refuses to proceed. This catches the case where migrations were renamed or reordered after being applied. - Forward-only. There are no down-migrations. A broken migration is rolled back at the transaction level by SQLite itself; the developer fixes the DDL and re-runs.
The migration history
The schema has evolved through four migrations so far:| Version | Name | What it does |
|---|---|---|
| 0001 | init | Creates events, memory_records, memory_records_fts, and _migrations, plus the initial indexes |
| 0002 | xml_extraction_fields | Adds concepts_json, files_touched_json, and observation_type columns to memory_records for the XML extraction pipeline |
| 0003 | project_path | Adds a nullable project_path column to events plus a compound index on (namespace, project_path) for project listing |
| 0004 | session_summary_type | Widens the observation_type CHECK constraint to include 'session_summary' (needed because SQLite has no ALTER CONSTRAINT, so the table is rebuilt) |
Privacy
Everything in the database is local. There is no remote replica, no cloud sync, no background upload. The only path out of your machine is extraction, where batches of events are sent to Amazon Bedrock viakiro-cli for LLM processing — and even that traffic goes out of your AWS account, using your credentials.
<private> is stripped before it reaches the database
Users can wrap sensitive content in <private>...</private> tags, and those spans are replaced with [REDACTED] before storage. The redaction happens in the collector’s cleaning pipeline, not in the database. By the time an event arrives at putEvent, the private spans are already gone.
This is a deliberate layering choice. Storage is a sink: if scrubbing lived here, a caller that forgot to scrub would still land scrubbed data in the database but would never see the private spans again and could build mistaken upstream assumptions (“storage protects us”). Centralizing the scrub in the pipeline puts the guarantee in one place and makes it visible. A guard test enforces that the storage layer never references <private> at all.
File-system permissions
The installer creates~/.kiro-learn/ with mode 0700 — owner-read/write/execute only. Other users on a shared machine cannot read your memory database. The storage layer does not widen these permissions; it inherits whatever the parent directory grants.
The storage interface
Callers do not import from the SQLite module directly. The collector’s pipeline code, retrieval code, and read API all see a single pluggable interface:StorageBackend as a dependency and never asks which backend it is. A guard test enforces this: any file outside src/collector/storage/sqlite/ that tries to import from that directory fails CI.
The point of this split is future-proofing without disruption. Later versions will ship alternative backends (Postgres with pgvector, Bedrock AgentCore Memory) and swapping them in becomes a matter of choosing a different opener at startup. The pipeline, retrieval, and UI code changes not at all.
Method contracts
The interface is small because the contracts are strict. A few worth calling out:putEventis idempotent. The insert usesINSERT OR IGNOREon the event ID primary key, so a retry from a confused shim is a silent no-op. The originaltransaction_timeis preserved on collision — a second insert does not overwrite it.putMemoryRecordrejects collisions. Memory record IDs are generated fresh on every extraction. A collision is an upstream bug, so the database surfaces it as an error instead of silently dropping the write.getEventByIdreturnsnullfor misses. Not an exception. The caller decides whether a missing event is a problem.searchMemoryRecordsis availability-biased. When FTS5 rejects a query, it falls back to LIKE instead of failing. The cost is worse ranking on the fallback path; the benefit is that retrieval never errors out on a weird prompt.closeis idempotent. A second close is a silent no-op, not an error.
async even though better-sqlite3 is synchronous. The async surface keeps the interface compatible with future backends that genuinely are async (remote databases, HTTP APIs) without forcing callers to change shape.
Prepared statements, parameterized queries
Every SQL statement is prepared once when the backend opens and reused for the life of the handle. Every user-controlled value — event body, FTS5 query string, namespace prefix — is bound positionally with? placeholders. SQL and data are never concatenated.
This is the primary defense against SQL injection. It also happens to be faster: preparing a statement once and binding values on every call avoids re-parsing and re-planning the query on each insert or search.
Key design decisions
SQLite first. SQLite is the simplest thing that covers the v1 requirements: durable local storage, full-text search, multi-process safety (WAL mode), and zero operational overhead. No server to run, no credentials to manage, no port to expose. The trade-off is that it does not scale to multiple machines — which is fine, because kiro-learn is explicitly a local tool until a future version decides otherwise. One file. Everything lives in~/.kiro-learn/kiro-learn.db. Backing up your memory is cp. Starting fresh is rm. There is no migration tool, no schema dump format, no cross-file consistency to worry about.
STRICT tables everywhere. SQLite’s default laxity about types is convenient for scripting and hostile for a persistence layer. Opting in to STRICT catches type errors at the boundary instead of letting them propagate.
FTS5 over building lexical search by hand. FTS5 is production-tested, maintained by the SQLite project, and handles tokenization, stemming, and ranking in one extension. Rebuilding any of that would be wasted effort. A later version that wants semantic search can layer vector similarity alongside FTS5 scores — Bedrock Knowledge Bases uses the same hybrid shape, so the algorithm will be portable.
Migrations are code, not files. Each migration’s DDL is a string constant inside a TypeScript module, compiled into the package. There is nothing to read from disk at runtime and nothing extra to ship. The compiled dist/ directory is self-contained.
Interface-first. The StorageBackend interface is the contract every backend satisfies. Writing the SQLite backend against the interface (rather than exposing it directly) forces the contract to stay small and backend-agnostic. Every capability that is hard to express in the interface is a capability a future backend would struggle to provide.
Privacy is a pipeline property, not a storage property. Storage stores what it is given. The pipeline is responsible for redaction. Putting the scrub in the right layer keeps the guarantee auditable and prevents a “defense in depth” that silently swallows bugs.
Code pointers
src/collector/storage/index.ts— theStorageBackendinterface and method contractssrc/collector/storage/sqlite/index.ts— the SQLite backend implementationsrc/collector/storage/sqlite/statements.ts— every prepared SQL statement in one filesrc/collector/storage/sqlite/fts5.ts— FTS5 query sanitization and LIKE escapingsrc/collector/storage/sqlite/migrations/runner.ts— migration runner with drift detectionsrc/collector/storage/sqlite/migrations/0001_init.ts— initial schemasrc/collector/storage/sqlite/migrations/0002_xml_extraction_fields.ts— extraction fieldssrc/collector/storage/sqlite/migrations/0003_project_path.ts— project path columnsrc/collector/storage/sqlite/migrations/0004_session_summary_type.ts— widened observation type constraint
Related pages
Viewer
The dashboard that reads from the database via the read API
Retrieval
How memory records are searched via FTS5
Extraction
How memory records are produced and inserted
Compaction
What happens when buffers grow faster than extraction drains them
Collector
The daemon that writes events to the database