Retrieval is the path that turns the user’s next prompt into a block of context the agent can read before responding. When a prompt event arrives, the Collector asks retrieval to find relevant memory records, format them, and hand back a string — all within a bounded time window so the agent is never kept waiting. Retrieval is inline: it runs during the shim’s POST and rides back in the same HTTP response. It is also best-effort: if the search fails or runs long, an empty context is returned and the prompt goes through unchanged. The agent proceeds either way.Documentation Index
Fetch the complete documentation index at: https://kiro-learn.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
When it runs
Retrieval fires when two conditions are both true:- The shim appends
?retrieve=trueto the POST. - The event’s
kindisprompt.
Extracting a query
The prompt event’s body determines the search string. Bodies come in three shapes and each maps differently:| Body type | What becomes the query |
|---|---|
text | The body’s content verbatim |
message | The content of the last turn |
json | The body data serialized as JSON |
How search works
The query layer is a thin seam over the database. It forwards a namespace, a query string, and a limit to the storage backend and returns whatever comes back. No ranking, no reshuffling, no filtering beyond what storage does. The seam exists so later versions can layer hybrid retrieval (lexical plus vector plus recency) without rewriting the collector. Storage handles the actual search in two stages: a primary full-text path and a substring fallback.Primary path: full-text search
The primary path uses SQLite’s FTS5 full-text index on memory record titles and summaries. FTS5 gives real ranking (BM25) and respects stemming and diacritics, so a search for “migrations” will hit records that mention “migration” and “migrate”. The catch is that FTS5’s match expression is a small query language — bare words, operators likeAND/OR/NOT, special characters like * and (. Passing a raw user prompt into it is both unsafe and useless: typos can cause parse errors, and operator words get interpreted as operators instead of content. So the query is rewritten before it reaches the index.
The rewrite, in order:
- Tokenize. Split the query on whitespace. Drop empty tokens. Deduplicate, preserving the order of first appearance.
- Rank and cap. If the tokenized list exceeds 32 terms, rank tokens by inverse document frequency and keep the top 32. Rare terms are more discriminating, so they survive; common terms get dropped first. IDF is read from FTS5’s own vocabulary table, so “rare” is defined relative to the actual memory corpus, not some external dictionary.
- Quote and join. Wrap each retained token in double quotes (which makes it an FTS5 phrase with no operator semantics) and join with the
ORkeyword.
migrate the user table to use uuid becomes an OR of seven quoted phrases. A prompt like what's the migration status? survives the apostrophe, the question mark, and the word “what” without crashing the parser.
Namespace isolation is applied in the SQL itself — the statement joins against mr.namespace LIKE ? || '%' — so a project can only ever see its own memories, regardless of what the query looks like.
Fallback path: substring search
The tokenizer aims to produce syntactically valid FTS5 expressions, but SQLite’s FTS5 parser has enough corner cases that defending against all of them is not worth the effort. Instead, retrieval leans on a fallback: If FTS5 refuses the sanitized query for any reason, the storage layer catches the error and runs aLIKE '%query%' search against the original user string instead. LIKE is slower and does no ranking — results come back ordered by creation date — but it always works, and it treats the query as a literal substring (the escape clause neutralizes % and _ so a prompt containing those characters still means what it says).
The tradeoff is explicit: availability over ranking quality. A search that returns weaker results is better than one that errors out, because a failed search shows up to the agent as “no context at all” and that is worse than substring hits.
Latency budget
Retrieval runs on the shim’s critical path. Every millisecond spent searching is a millisecond the developer spends staring at a blank prompt. So the assembler enforces a hard budget — default 500 ms — on the search step. The mechanism is a race between two promises: the query and a timeout. Whichever resolves first wins. If the timer wins, retrieval returns an empty context with the elapsed latency recorded. The search may still complete in the background; the storage layer is not cancelled. But nothing is injected into the prompt. The assembler catches every error too. An unexpected exception inside the query, a storage outage, an unhandled rejection — all of them resolve to the same empty-result response. The only information that surfaces to the caller is the latency, which the collector records so the behavior is visible in logs and the viewer UI. Why 500 ms? It is long enough that full-text search over a realistic memory corpus finishes comfortably, and short enough that the developer does not notice it. Prompts do not feel laggy. The budget is configurable per collector instance if a specific deployment wants different tuning.Context assembly
Once records come back, the assembler turns them into a markdown block the agent can read. The format is intentionally plain:latency_ms and an empty records array), but the context field is empty. The shim detects the empty string and does not write anything to stdout, so no header is injected if there is nothing to say.
The record IDs are returned separately in the records field. That field is for audit and debugging — the viewer UI shows which memories were retrieved on a given prompt — not for the agent’s consumption. The agent sees only the context string.
What comes back to the agent
The shim writes the context string to stdout. In Kiro CLI and Kiro IDE both, the agent runtime reads that stdout and prepends it to the prompt before invoking the model. From the model’s perspective, the prompt now starts with a “Prior observations” section and then continues with whatever the user typed. From the developer’s perspective, retrieval is invisible. The prompt goes in, the response comes out. If kiro-learn has relevant prior context, the response reflects it. If not, the response is the same one the model would have produced anyway.Key design decisions
Retrieval never throws. The assembler returns aRetrievalResult under every condition: success, empty result, FTS5 error, storage error, timeout. There is no path where retrieval breaks an agent turn. The cost of this is quiet failures — if retrieval is consistently returning empty results, the symptom is “memory feels useless” rather than a visible error. The viewer UI’s latency-per-retrieval metric is the counterweight.
Availability over ranking. The LIKE fallback exists so every reasonable query eventually produces results, even when the FTS5 sanitizer misses an edge case. Worse rankings are a better failure mode than no rankings at all.
Budget enforced at assembly, not at storage. The 500 ms budget wraps the entire search, including whatever the storage backend does internally. If a future storage backend — vector search, remote KB — is slower than FTS5, the budget still applies. Storage does not need to know about it.
Formatting is fixed. The context format is not configurable. Every agent sees the same structure: a header, a list of records, summaries, facts. Keeping the format fixed means the model has one consistent shape to parse across every turn, and the project does not accumulate formatting flags the way a template engine would.
Inline, not pull. Retrieval runs on the prompt event’s ingest path, not as a separate API call. The shim gets one HTTP round-trip per prompt — POST the event, get the context back. This is different from the MCP server, which exposes a pull-based search_memory tool for agents that prefer to query explicitly. The inline path is the default; the MCP path is available when the agent wants more control.
Related pages
Database
The FTS5 index that backs the primary search path
Viewer
Where retrieved records surface in the dashboard
Extraction
Where the memory records retrieval searches over come from
Summarization
Turn summaries that surface through the same retrieval path
Collector
The daemon that invokes retrieval on prompt ingestion
Kiro CLI shim
How the CLI shim requests retrieval and writes context to stdout
Kiro IDE shim
How the IDE shim requests retrieval and writes context to stdout