Architecture deep dive¶

ZettelForge v2.7.0 is a hybrid-storage agentic memory system with 107 Python source files organized into 17 packages. It processes threat intelligence through extraction, storage, retrieval, and synthesis pipelines running entirely on-device by default.

Module organization¶

ZettelForge has 17 packages under src/zettelforge/:

Package	Files	Function
Root (`zettelforge/`)	43	Core: memory_manager, config, note_schema, retrieval, synthesis, KG
`detection/`	4	Detection rule base + consumer protocol
`integrations/`	3	External system connectors
`llm_providers/`	7	Provider adapters (local, Ollama, mock, NVIDIA, litellm)
`mcp/`	3	MCP server (FastMCP protocol 2024-11-05)
`osint/` + collectors	23	Passive OSINT enrichment pipeline
`scripts/`	6	Migration and maintenance tools
`sigma/`	7	Sigma rule ingestion, entities, tags, CLI
`yara/`	8	YARA rule ingestion, CCCS metadata, CLI

Key internal dependencies (by import count):

zettelforge.log — structured logging via structlog (19 importers)
pathlib.Path — path handling (11 importers)
threading — background workers (9 importers)
datetime.datetime — temporal tracking (9 importers)
zettelforge.note_schema.MemoryNote — core data type (8 importers)

Data pipeline¶

Write path (remember())¶

MemoryManager.remember() uses a dual-stream write path (MAGMA-inspired):

Content
  └─ 1. Entity Extraction    (EntityExtractor — regex + optional LLM NER)
  └─ 2. Governance Validation (GovernanceValidator — size, injection, PII, anomaly)
  └─ 3. Alias Resolution     (AliasResolver — normalizes actor/CVE names)
  └─ 4. Dual-Stream Write
         ├─ Fast path → SQLite (notes + kg_nodes/kg_edges) + LanceDB (embeddings)
         │              Returns in ~45ms (fastembed in-process)
         └─ Slow path → Background worker (causal triple extraction, LLM NER)

Read path (recall())¶

MemoryManager.recall() routes through intent classification then blended retrieval:

Query
  └─ IntentClassifier  (keyword heuristics + vector fallback)
  └─ Three retrievers run in parallel:
         ├─ VectorRetriever   (LanceDB cosine similarity + entity boost)
         ├─ GraphRetriever    (BFS from query entities in SQLite KG)
         └─ EntityRetriever   (exact-match lookup in EntityExtractor index)
  └─ BlendedRetriever.blend() (min-max normalized score fusion)
  └─ Cross-encoder rerank     (ms-marco-MiniLM-L-6-v2, 8 candidates, 256 chars)
  └─ Results

Storage architecture¶

ZettelForge uses three storage layers by default:

Layer	Technology	Purpose
Structured	SQLite (`~/.amem/`)	Notes, KG nodes/edges, entities
Vector	LanceDB	768-dim embeddings for semantic search
Graph (direct access)	JSONL (`kg_nodes.jsonl`, `kg_edges.jsonl`)	KnowledgeGraph class for graph queries

SQLite schema (default backend)¶

notes table — 35 columns including:

id, created_at, updated_at — lifecycle
content_raw, source_type, source_ref — content and provenance
embedding_vector, embedding_model — vector metadata
entities, domain, tier, confidence — semantic classification
superseded_by, supersedes — versioning chain

kg_nodes table: node_id, entity_type, entity_value, properties, created_at, updated_at

kg_edges table: edge_id, from_node_id, to_node_id, relationship, edge_type, note_id, properties, created_at, updated_at

Indexes on (entity_type, entity_value) for nodes and on from_node_id, to_node_id, relationship, edge_type for edges.

Two KG paths¶

MemoryManager.remember() writes KG data through SqliteBackend.add_kg_node() and add_kg_edge(). These writes go to the SQLite kg_nodes/kg_edges tables.

KnowledgeGraph (imported via zettelforge.get_knowledge_graph()) uses JSONL persistence (kg_nodes.jsonl, kg_edges.jsonl) for direct graph queries: get_neighbors(), traverse(), get_changes_since(). StoreGraphSource wraps the per-store SQLite KG for isolated reads during recall, preventing phantom note IDs from other stores.

LanceDB configuration¶

Default model: nomic-ai/nomic-embed-text-v1.5-Q (768-dim ONNX via fastembed)
Index type: IVF_FLAT (avoids double-quantization artifacts)
Provider: fastembed (in-process, ~7ms/embed) or Ollama (HTTP, ~30ms/embed)
Fallback: deterministic mock embeddings (offline/CI use)

Entity extraction¶

EntityExtractor recognizes 19 entity types across three categories:

Category	Types	Extraction method
CTI	`cve`, `intrusion_set`, `actor`, `tool`, `campaign`, `attack_pattern`	Regex
IOC (STIX Cyber Observables)	`ipv4`, `domain`, `url`, `md5`, `sha1`, `sha256`, `email`	Regex
Conversational	`person`, `location`, `organization`, `event`, `activity`, `temporal`	LLM NER (optional; requires LLM provider)

Total: 13 regex patterns + 6 LLM types = 19 entity types.

Selected regex patterns (source: entity_indexer.py):

"cve":           re.compile(r"(CVE-\d{4}-\d{4,})", re.IGNORECASE)
"intrusion_set": re.compile(r"\b((?:apt|unc|ta|fin|temp)\s*-?\s*\d+)\b", re.IGNORECASE)
"attack_pattern": re.compile(r"\b(T\d{4}(?:\.\d{3})?)\b")
"ipv4":          re.compile(r"\b(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:...)\b")

A code-context filter (_CODE_CONTEXT_PATTERN) suppresses hash false-positives from git commit lines, code fences, and function definitions.

Retrieval policies¶

Intent classification assigns each query to a category that sets retrieval weights:

Intent	Vector	Entity	Graph	Temporal	Use case
FACTUAL	0.3	0.7	0.2	0.0	CVE lookups, specific entity queries
RELATIONAL	0.2	0.2	0.5	0.1	Tool attribution, actor relationships
CAUSAL	0.1	0.1	0.6	0.2	Root cause analysis, kill chain
EXPLORATORY	0.5	0.2	0.2	0.1	General threat landscape research

BlendedRetriever.blend() uses min-max normalized score fusion. blend_rrf() provides Reciprocal Rank Fusion (rrf_k=60) as an alternative. See Retrieval Policies for full keyword lists and fusion formulas.

Synthesis system¶

SynthesisGenerator.synthesize() assembles context from up to 10 notes (500 characters each, 3000 max tokens), then calls the configured LLM. Token budget is estimated as len(text) // 4.

Four output formats:

Format	Schema	Use case
`direct_answer`	`{answer, confidence, sources}`	Quick facts (ZettelForge OSS, ThreatRecall.ai SaaS)
`synthesized_brief`	`{summary, themes[], confidence}`	Executive summary (ThreatRecall.ai SaaS)
`timeline_analysis`	`{timeline[{date, event}], confidence}`	Incident reconstruction (ThreatRecall.ai SaaS)
`relationship_map`	`{entities[], relationships[]}`	Threat landscape (ThreatRecall.ai SaaS)

direct_answer is the only format available in ZettelForge OSS without a synthesis extension. The other three formats are available in ThreatRecall.ai SaaS. The system falls back to direct_answer when the requested format is not available.

LLM integration¶

The LLM layer has three embedding providers and a separate generation path:

Embedding providers (in preference order):

fastembed (ONNX, in-process): ~7ms/embed, default — no server required
Ollama (HTTP): ~30ms/embed, optional
Mock: deterministic fallback for offline/test use

Generation (LLM) source defaults:

Setting	Default	Notes
`llm.provider`	`ollama`	Also supports `local` (llama-cpp-python) and `mock`
`llm.model`	`qwen3.5:9b`	Source default in `config.py:123` and `config.default.yaml`; unresolved upstream, so set an explicit model your provider can load
`llm.temperature`	`0.1`	Low for deterministic extraction
`llm.timeout`	`180.0s`	Bumped from 60s at v2.5.2 for 9B reasoning models

Set an explicit LLM model

This docs set has not verified qwen3.5:9b as a working Ollama tag. For Ollama, set llm.model or ZETTELFORGE_LLM_MODEL to an installed tag such as qwen2.5:3b. For the local provider, use a local HuggingFace model ID such as Qwen/Qwen2.5-3B-Instruct-GGUF with the matching GGUF filename.

LLM NER and synthesis are optional. The system runs fully offline with mock embeddings and no generation capability — recall() and entity extraction still work; synthesize() returns a zero-confidence answer.

Performance characteristics¶

Benchmarks measured on DGX Spark GB10, v2.7.0 baseline, deterministic config (no LLM for judge):

Metric	v2.7.0 baseline
CTI retrieval accuracy	75.0%
LoCoMo accuracy (keyword judge)	7.0%
CTI recall p50 (idle machine)	79ms
recall mean (profiled, 60 calls)	117.6ms
recall p95 (profiled)	258ms
LoCoMo p50 / p95	336ms / 387ms
remember() fast path	~45ms
Embedding (fastembed)	~7ms

Source: benchmarks/BENCHMARK_REPORT.md, session 2026-06-09. The CTI and LoCoMo benchmarks were measured with the keyword-judge path (no synthesis LLM installed on this host).

Key optimizations in v2.4.0+ that produced the current baseline:

Cross-encoder rerank (Xenova/ms-marco-MiniLM-L-6-v2): 8 candidates, 256 chars/doc — accounts for +15pp CTI accuracy
IVF_FLAT index in LanceDB: avoids double-quantization artifacts
fastembed (ONNX in-process) vs Ollama HTTP: ~23ms vs ~30ms per embed
StoreGraphSource per-store KG isolation: eliminates phantom note IDs from mixed-store traversal
MemSAD vectorization: write-time anomaly gate from O(n²) to ~3.4ms via numpy pairwise scoring

Security and governance¶

OCSF audit logging¶

All operations emit structured OCSF v1.3 events via ocsf.py (GOV-012 compliant):

log_api_activity() — remember/recall/synthesize calls (class 6002)
log_authentication() — auth events (class 3001)
log_authorization() — access decisions (class 3003)
log_config_change() — configuration mutations (class 5001)
log_file_activity() — storage operations (class 4001)
log_process_activity() — background worker events (class 7002)
log_account_change() — account lifecycle (class 3005)

Governance controls (write-time)¶

Four controls execute on every remember() call in this order:

Size limits (LimitsConfig): max 50MB content, 30s recall timeout
Prompt injection detection (PromptSecurityValidator): 7 deterministic pattern categories, always active, no LLM required
PII detection (PIIValidator): disabled by default; requires pip install zettelforge[pii]; CTI allowlist (IP addresses, URLs, domains never redacted)
Memory anomaly gate (MemSAD): enabled in audit mode by default; scores inbound content against recent store history; quarantines anomalies to JSONL

See Governance Controls for full parameter reference.

Epistemic tiers¶

Tier	Meaning
A (Authoritative)	Verified from a trusted source
B (Operational)	Working knowledge, plausible
C (Support)	Inferred or speculative

Tiers filter retrieval (SynthesisConfig.tier_filter) and propagate to OCSF events.

Secrets handling¶

Config: ${ENV_VAR} syntax in YAML; resolved at load time via _resolve_env_refs()
Redaction: automatic in repr() for keys matching "key", "token", "secret", "password"
Sensitive config keys never appear in log output or /api/config responses

Configuration¶

Resolution order (highest priority first)¶

Environment variables (ZETTELFORGE_*)
config.yaml in the working directory
config.yaml at the project root
config.default.yaml (reference defaults)
Hardcoded dataclass defaults

Key sections¶

Section	Purpose
`storage`	Data directory (default `~/.amem`)
`backend`	`sqlite` (default) or `typedb` (requires additional extension)
`embedding`	Provider, model, dimensions
`llm`	Provider, model, temperature, timeout
`retrieval`	`default_k` (10), `similarity_threshold` (0.25)
`synthesis`	`max_context_tokens` (3000), `tier_filter`
`governance`	Prompt injection, PII, memory anomaly, limits

See Configuration Reference for every key and its environment variable override.

Public API¶

29 items in zettelforge.__all__ (v2.7.0):

Classes: BlendedRetriever, Edition, EditionError, ExtractedFact, FactExtractor, GraphRetriever, IntentClassifier, KnowledgeGraph, MemoryManager, MemoryNote, MemoryUpdater, NoteConstructor, QueryIntent, ScoredResult, SynthesisGenerator, SynthesisValidator, UpdateOperation, VectorRetriever

Constants: ENTITY_TYPES, RELATION_TYPES

Factory functions: get_edition, get_intent_classifier, get_knowledge_graph, get_memory_manager, get_synthesis_generator, get_synthesis_validator

Edition inspection: edition_name, is_community, is_enterprise, get_edition

is_enterprise() returns True when the legacy SaaS compatibility hook reports active. is_community() is the inverse. edition_name() returns "ZettelForge + Extensions" when extensions are active, or "ZettelForge" for the base OSS installation. These identifiers are runtime compatibility hooks; the public product model remains ZettelForge OSS and ThreatRecall.ai SaaS.

See Memory Manager API for method-level signatures.

MCP server¶

The MCP server (mcp/server.py) exposes 7 tools over the MCP 2024-11-05 protocol:

Tool	Description
`zettelforge_remember`	Ingest a note with full governance pipeline
`zettelforge_recall`	Retrieve notes by query with TLP controls
`zettelforge_synthesize`	Generate an LLM-based synthesis
`zettelforge_entity`	Look up notes by entity type and value
`zettelforge_graph`	Traverse the knowledge graph
`zettelforge_stats`	Return store statistics
`zettelforge_sync`	OpenCTI sync (ThreatRecall.ai SaaS; returns 501 in OSS)

See MCP Protocol Reference for full input/output schemas.

CLI¶

python -m zettelforge demo     # interactive CTI demo (ingests 5 reports, runs recall + synthesis)
python -m zettelforge version  # print version string

ZettelForge OSS: boundaries and trade-offs¶

ZettelForge OSS (Apache-2.0 license) runs fully self-hosted with no external dependencies beyond Python packages. The following are deliberate scope constraints in the current release, not gaps:

No built-in HTTP authentication. The web server (/api/*) uses an API key guard for non-loopback requests. Multi-user auth and per-tenant isolation are available in ThreatRecall.ai.
No encryption at rest. Data is stored in plain SQLite and JSONL. Apply OS-level encryption (dm-crypt, FileVault) for data-at-rest requirements.
SQLite KG lacks graph inference. The SQLite KG supports BFS traversal and edge queries but not TypeDB-style reasoning rules. TypeDB-based deep inference is available in ThreatRecall.ai SaaS.
Token estimation is naive. len(text) // 4 approximates tokens; actual token counts vary by model. This affects context window management in synthesis.
No embedding cache TTL. The LRU embedding cache is keyed by (model, content hash) with no expiry; memory consumption grows with unique content volume.