Retrieval policies¶

Modules: zettelforge.intent_classifier, zettelforge.vector_retriever, zettelforge.graph_retriever, zettelforge.blended_retriever

When you call mm.recall(), ZettelForge classifies your query's intent, routes it through parallel vector and graph retrievers, and blends the results using intent-specific policy weights. This page documents every parameter and formula in that pipeline for ZettelForge 2.7.0.

Intent types¶

from zettelforge.intent_classifier import QueryIntent

class QueryIntent(Enum):
    FACTUAL = "factual"
    TEMPORAL = "temporal"
    RELATIONAL = "relational"
    CAUSAL = "causal"
    EXPLORATORY = "exploratory"
    UNKNOWN = "unknown"

Intent	Description	Example query
`FACTUAL`	Entity lookup, direct fact retrieval	`"What CVE was used in the SolarWinds attack?"`
`TEMPORAL`	Time-based queries, timeline reconstruction	`"What changed since the last incident?"`
`RELATIONAL`	Graph traversal, relationship mapping	`"Who uses Cobalt Strike?"`
`CAUSAL`	Cause-effect chains	`"Why did the attacker pivot to the domain controller?"`
`EXPLORATORY`	General exploration, broad context	`"Tell me about APT28"`
`UNKNOWN`	Ambiguous or unclassifiable	Fallback when no keywords match

Intent classification¶

Keyword matching¶

The classifier counts keyword hits against the query (lowercased) for each intent.

FACTUAL keywords

"cve-"  "cve "  "vulnerability"  "exploit"  "threat"
"what is"  "what was"  "who is"  "name"  "identify"

TEMPORAL keywords

"when"  "timeline"  "since"  "before"  "after"  "changed"
"history"  "previously"  "earlier"  "latest"  "recent"

RELATIONAL keywords

"who uses"  "who targets"  "who conducts"  "what tools does"  "what malware does"
"what technique"  "what cve does"  "used by"  "attributed to"  "related to"
"connected to"  "associated with"  "linked to"  "uses tool"  "between"
"relationship"  "connection"  "link"  "which actor"  "which group"  "which apt"
"what does"

CAUSAL keywords

"why"  "because"  "caused by"  "enables"  "leads to"
"results in"  "due to"  "reason for"

EXPLORATORY keywords

"tell me about"  "explain"  "describe"  "overview"  "information on"
"details about"  "context"  "summarize"  "what do we know"  "brief"

Classification decision tree¶

1. Score each intent: count keyword hits in lowercased query.
2. best_score = max score across all intents.
   competing = count of other intents with score > 0.

3. If best_score >= 2:
      → return best intent, confidence = min(1.0, best_score / 4), method = "keyword"

4. If best_score == 1 AND competing == 0:
      → return best intent, confidence = 0.6, method = "keyword_unambiguous"
         (clear single-keyword match with no competing signal)

5. If use_llm_fallback is True:
      → call LLM, return intent, confidence = 0.8, method = "llm"
         (on LLM failure: EXPLORATORY, confidence = 0.5, method = "llm_fallback")

6. Default fallback:
      → EXPLORATORY, confidence = 0.3, method = "default"

The global IntentClassifier instance (returned by get_intent_classifier()) is created with use_llm_fallback=False. The LLM fallback is opt-in: create your own instance with IntentClassifier(use_llm_fallback=True) to enable it. When enabled, the LLM call uses max_tokens=20 and temperature=0.1.

Policy weights¶

Each intent maps to a policy dict that controls how results from each retrieval source are weighted.

Intent	`vector`	`entity_index`	`graph`	`temporal`	`top_k`
`FACTUAL`	0.3	0.7	0.2	0.0	3
`TEMPORAL`	0.2	0.1	0.2	0.5	5
`RELATIONAL`	0.2	0.2	0.5	0.1	10
`CAUSAL`	0.1	0.1	0.6	0.2	10
`EXPLORATORY`	0.5	0.2	0.2	0.1	10
`UNKNOWN`	0.4	0.2	0.2	0.2	5

Weights do not sum to 1.0. They control relative contribution to the blended score after per-source normalization. top_k is the suggested result count; the caller may override it.

The FACTUAL graph weight (0.2) is non-zero even though FACTUAL is an entity lookup. Many factual CTI questions require a single graph hop to answer — for example, "What CVE does APT28 exploit?" traverses a targets edge.

VectorRetriever¶

Module: zettelforge.vector_retriever

Constructor defaults¶

Parameter	Default	Description
`similarity_threshold`	`0.15`	Minimum cosine similarity to include a result
`entity_boost`	`2.5`	Per-overlapping-entity score multiplier
`exact_match_boost`	`1.0`	Multiplier when a query entity appears verbatim in note content (default has no effect)
`regenerate_invalid_embeddings`	`True`	Re-embed notes with invalid vectors before scoring

Score calculation¶

Cosine similarity (module-level function):

def cosine_similarity(a: list[float], b: list[float]) -> float:
    dot = np.dot(a, b)
    return float(dot / (np.linalg.norm(a) * np.linalg.norm(b)))
    # Returns 0.0 if either norm is zero

Embedding model: nomic-embed-text-v1.5-Q, 768 dimensions, generated in-process via fastembed ONNX.

Entity boost (applied post-retrieval):

boost = entity_boost ** overlap_count   # default 2.5 per overlapping entity
boost *= exact_match_boost              # default 1.0 — no effective change
score = cosine_similarity * boost

Where overlap_count is the number of query entities that also appear in the note's entity set.

Similarity threshold: notes scoring below similarity_threshold (default 0.15) are excluded.

Link expansion¶

After scoring, VectorRetriever adds directly linked notes with a decayed score:

linked_note_score = parent_score * 0.5

This "hop decay" lets a strong match pull in its immediate neighbors. Maximum expanded results: k * 2.

LanceDB vs. in-memory¶

Mode	Condition	Score origin
LanceDB	`use_lancedb=True` and `store.lancedb is not None`	`similarity = 1.0 - _distance` (LanceDB returns L2 distance)
In-memory	LanceDB unavailable or `use_lancedb=False`	Direct `cosine_similarity(query_vector, note_vector)`

The entity boost and link expansion are applied in both modes.

Embedding validation¶

Before scoring, VectorRetriever checks each note's embedding:

Check	Condition	Action
Null vector	`vector is None`	Regenerate
Wrong dimensions	`len(vector) != 768`	Regenerate
All zeros	`all(v == 0.0 for v in vector)`	Regenerate
Low variance	`np.var(vector) < 0.001`	Regenerate

Regeneration calls get_embedding(note.content.raw[:1000]). Notes that fail regeneration are skipped.

Return type¶

retrieve() accepts return_scores: bool = False:

False (default): returns list[MemoryNote]
True: returns list[tuple[MemoryNote, float]] — required by BlendedRetriever.blend()

GraphRetriever¶

Module: zettelforge.graph_retriever

Graph source¶

GraphRetriever accepts any object implementing the GraphSource protocol:

class GraphSource(Protocol):
    def get_node(self, entity_type: str, entity_value: str) -> dict | None: ...
    def get_node_by_id(self, node_id: str) -> dict | None: ...
    def get_outgoing_edges(self, node_id: str) -> list[dict]: ...

The concrete implementation used in production is StoreGraphSource, which wraps a StorageBackend and reads a per-store scoped knowledge graph. Using per-store scoping prevents phantom note IDs from cross-store graph data.

ScoredResult¶

@dataclass
class ScoredResult:
    note_id: str
    score: float
    hops: int
    path: list[str]  # e.g. ["actor:APT28", "tool:Mimikatz", "note:note_20240315_..."]

BFS traversal¶

retrieve_note_ids(query_entities, max_depth=2) accepts query_entities as dict[str, list[str]] (entity type to list of values). It starts a BFS from each entity node and scores every note node it reaches:

score = 1.0 / (1.0 + hop_distance)

Hops	Score
0	1.000
1	0.500
2	0.333
3	0.250

When multiple paths reach the same note, the path with the highest score (fewest hops) wins. Results are returned sorted by score descending.

BlendedRetriever¶

Module: zettelforge.blended_retriever (v2.3.1)

The v2.3.1 update replaced positional scoring with normalized score fusion.

blend() — normalized score fusion¶

def blend(
    self,
    vector_results: list[tuple],      # List[Tuple[MemoryNote, float]] — actual similarity scores
    graph_results: list[ScoredResult],
    policy: dict[str, float],
    note_lookup: Callable[[str], MemoryNote | None],
    k: int = 10,
) -> list[MemoryNote]

Algorithm:

1. Normalize vector scores to [0, 1] via min-max normalization.
   If all scores are equal: assign uniform 0.5.

2. Normalize graph scores to [0, 1] via min-max normalization.

3. Vector signal:
     For each (note, norm_score) in normalized vector results:
       blended = norm_score * policy["vector"]
       scores[note.id] = (blended, note)

4. Graph signal:
     For each (note_id, norm_score) in normalized graph results:
       graph_score = norm_score * policy["graph"]
       If note_id already in scores:
         scores[note_id] = (existing_score + graph_score, existing_note)  ← both-source bonus
       Else:
         note = note_lookup(note_id)
         If found: scores[note_id] = (graph_score, note)

5. Sort by blended score descending, return top-k notes.

Notes found by both vector and graph retrieval receive additive scores from both signals. This "both-source bonus" ranks notes that are semantically similar and graph-proximate above notes from only one source.

Min-max normalization formula:

normalized = (score - min_score) / (max_score - min_score)

blend_rrf() — Reciprocal Rank Fusion¶

An alternative merge method that is robust to score scale differences across retrievers:

def blend_rrf(
    self,
    vector_results: list[tuple],    # List[Tuple[MemoryNote, float]]
    graph_results: list[ScoredResult],
    note_lookup: Callable[[str], MemoryNote | None],
    k: int = 10,
    rrf_k: int = 60,               # standard RRF constant
) -> list[MemoryNote]

RRF score for each note: sum(1 / (rrf_k + rank)) across all signals in which the note appears. A rank-1 vector result at default rrf_k=60 scores 1/61 ≈ 0.016. This is the fusion method used by production retrieval systems (Elasticsearch, Vespa). Notes found in both signals still receive an additive bonus.

Full pipeline¶

Query
  |
  v
IntentClassifier.classify(query)
  |
  +-- QueryIntent + confidence + method
  |
  v
IntentClassifier.get_traversal_policy(intent)
  |
  +-- policy weights dict
  |
  v
[Parallel]
  |
  +-- VectorRetriever.retrieve(query, domain, k, return_scores=True)
  |     cosine similarity (LanceDB or in-memory)
  |     → entity boost (2.5x per overlapping entity)
  |     → link expansion (0.5x score decay for linked notes)
  |     → filter below similarity_threshold (0.15)
  |     → List[Tuple[MemoryNote, float]]
  |
  +-- GraphRetriever.retrieve_note_ids(query_entities, max_depth=2)
  |     BFS from resolved entity nodes via StoreGraphSource
  |     score = 1/(1+hops), best score per note wins
  |     → List[ScoredResult]
  |
  v
BlendedRetriever.blend(vector_results, graph_results, policy, note_lookup, k)
  |
  +-- Min-max normalize vector scores
  +-- Min-max normalize graph scores
  +-- Weighted additive fusion (both-source bonus for notes found by both)
  +-- Sort descending, truncate to top-k
  |
  v
List[MemoryNote]

Configuration reference¶

These config keys control retrieval pipeline behavior. Full defaults and env var overrides are in the Configuration reference.

Config key	Default	Effect
`retrieval.similarity_threshold`	`0.25`	Config-level threshold (VectorRetriever constructor overrides to `0.15`)
`retrieval.entity_boost`	`2.5`	Score multiplier per overlapping entity
`retrieval.max_graph_depth`	`2`	Max BFS hops in GraphRetriever
`retrieval.max_results`	`10`	Default `k` passed to retrieval calls
`retrieval.use_lancedb`	`true`	Prefer LanceDB when available