The two-phase extraction pipeline¶

ZettelForge's memory evolution pipeline — activated with evolve=True on remember() — implements a Mem0-inspired two-phase process that solves the core problem of append-only memory systems: redundancy, contradiction, and noise accumulate over time until retrieval quality degrades.

The MCP server (zettelforge_remember) and web API (POST /api/remember) both enable evolution by default. You can also call remember_with_extraction() directly for programmatic use.

Prerequisites: familiarity with remember() and recall() as covered in the Quickstart.

The problem: append-only memory¶

The naive approach to agent memory is to store everything: every report paragraph, every conversation turn. Append it and let vector search surface what's relevant later.

This fails at scale in three ways:

Redundancy: "APT28 uses Cobalt Strike" stored 47 times across different reports. Each retrieval returns duplicate context, wasting LLM tokens.
Contradiction: "Server ALPHA is compromised" from January and "Server ALPHA has been remediated" from March are both retrievable. Without evolution, both surface and the agent cannot determine which is current.
Noise: Greetings and meta-commentary stored alongside intelligence. Vector search has no way to distinguish them from substantive content.

Mem0's research on memory management for AI agents (mem0.ai/paper) demonstrated that selective extraction with update operations substantially reduces these problems compared to full-context storage approaches.

Phase 1: extraction (FactExtractor)¶

FactExtractor takes raw content and distills it into concise, atomic facts. Each fact gets an importance score from 1 to 10. The LLM prompt instructs the extractor to:

Extract only facts worth remembering long-term
Skip greetings, filler, and meta-commentary
Score each fact by importance to the intelligence domain

Facts below min_importance (default: 3) are discarded before Phase 2. This is the first filter — low-value content never reaches the memory store.

Configuration:

model (default: qwen2.5:3b) — the LLM used for extraction
max_facts (default: 5 per remember_with_extraction call) — caps facts extracted per input

Fallback behavior: if the LLM is unreachable, the extractor returns the raw content (first 500 characters) as a single fact with importance 5. The system degrades to append-only rather than dropping the write.

Example input:

APT28 has shifted tactics. They dropped DROPBEAR and now exploit edge devices
using compromised credentials.

Example extracted facts (requires configured LLM):

[
  {"fact": "APT28 shifted to edge device exploitation", "importance": 8},
  {"fact": "APT28 dropped DROPBEAR malware", "importance": 7},
  {"fact": "APT28 uses compromised credentials for edge access", "importance": 6}
]

Phase 2: update (MemoryUpdater)¶

For each extracted fact that passes the importance threshold, the updater determines what to do with it relative to existing memory.

Step 1: Retrieve the top 3 most similar existing notes via vector search.

Step 2: If no similar notes exist, the operation is added. If similar notes exist, the LLM compares the new fact against them and decides one of four operations:

Status returned	When	What happens
`added`	No similar notes exist	New note stored; entities indexed in SQLite and LanceDB
`updated`	Fact refines an existing note	New note stored; old note marked `superseded_by`
`corrected`	Fact contradicts an existing note	Correction note stored; old note marked `superseded_by`
`noop`	Fact already captured	Nothing stored

The updated and corrected paths never modify or delete old notes. They create new notes and mark old ones as superseded_by. This preserves the full history — consistent with the Zettelkasten principle of evolution over deletion — while ensuring recall() returns current intelligence by default.

remember_with_extraction() returns a list of (MemoryNote | None, status) tuples, one per extracted fact. Status is one of "added", "updated", "corrected", or "noop".

How the pipeline affects retrieval¶

The pipeline's filtering and deduplication directly improve retrieval quality:

Scenario	Without pipeline	With pipeline
Repeated reporting on APT28/Cobalt Strike	47 separate notes	1 authoritative note (46 noop'd)
Contradictory server status notes	Both retrieved equally	Old note superseded; only current returned
Greeting mixed with intelligence	Retrieved as a note	Filtered at min_importance

Fewer, more relevant notes in the retrieval context means the synthesis LLM produces more focused answers and uses fewer tokens per response.

Write latency: fast path and background enrichment¶

remember() uses a dual-stream write path:

Fast path: embedding, SQLite storage, LanceDB vector write, entity index update, heuristic knowledge graph edges. Returns in approximately 45 ms.
Background (slow) path: LLM causal triple extraction, deferred to a background worker. Not required for the call to return.

Pass sync=True if you need causal enrichment to be visible before your next recall().

When evolve=True, the full Phase 1 + Phase 2 cycle runs on the calling thread and returns before control is released.

Configuration¶

Two parameters control the pipeline's selectivity:

Parameter	Default	Effect
`max_facts`	5 (`remember_with_extraction`), 10 (`remember_report`)	Maximum facts extracted per call. Higher values capture more from long reports but add LLM calls in Phase 2.
`min_importance`	3	Threshold below which facts are discarded. Raise to 7 for high-confidence-only ingestion; lower to 1 to store more context at the cost of noise.

For report ingestion via remember_report(), max_facts defaults to 10 and chunk_size defaults to 3,000 characters. A 9,000-character report split into 3 chunks gets up to 30 facts (10 per chunk) processed through the full two-phase pipeline independently per chunk.

Why not modify or delete old notes?¶

Each note in ZettelForge carries a temporal identity: it reflects what was believed true at the time of ingestion. Deleting it loses that provenance. Instead, the pipeline:

Creates a new note capturing the current understanding.
Marks the old note as superseded_by with the new note's ID.

recall() filters superseded notes by default. Analysts who need the full history can pass include_superseded=True. This design mirrors how intelligence assessments work in practice: previous assessments are not destroyed when new ones arrive, but they are no longer the authoritative answer.

When to enable evolution¶

Scenario	Recommendation
Ingesting CTI reports over time	`remember_report()` — chunked, evolved by default
One-time factual storage	`remember(evolve=False)` — skip extraction overhead
Conversational agent with long context	`remember(evolve=True)` — removes conversational noise
Bulk historical import	`remember(evolve=False)` — speed; run evolution later via `evolve_note()`