Configuration Reference
Module: zettelforge.config
from zettelforge.config import get_config, reload_config, ZettelForgeConfig
Resolution Order
Configuration values are resolved with highest priority first:
| Priority |
Source |
Example |
| 1 (highest) |
Environment variables |
TYPEDB_HOST=db.internal |
| 2 |
config.yaml in working directory |
./config.yaml |
| 3 |
config.yaml in project root |
<project>/config.yaml |
| 4 |
config.default.yaml in project root |
<project>/config.default.yaml |
| 5 (lowest) |
Hardcoded defaults in config.py |
Dataclass field defaults |
Config Access
cfg = get_config() # Load once, cached singleton
cfg = reload_config() # Force reload from file + env
cfg.typedb.host # "localhost"
cfg.retrieval.default_k # 10
cfg.backend # "typedb"
All Configuration Keys
storage
@dataclass
class StorageConfig:
data_dir: str = "~/.amem"
| Key |
Type |
Default |
Env Override |
Description |
storage.data_dir |
str |
~/.amem |
AMEM_DATA_DIR |
Root directory for LanceDB vectors, JSONL notes, entity indexes, and snapshots. |
typedb (zettelforge-enterprise only)
@dataclass
class TypeDBConfig:
host: str = "localhost"
port: int = 1729
database: str = "zettelforge"
username: str = ""
password: str = ""
| Key |
Type |
Default |
Env Override |
Description |
typedb.host |
str |
localhost |
TYPEDB_HOST |
TypeDB server hostname or IP. |
typedb.port |
int |
1729 |
TYPEDB_PORT |
TypeDB server port. |
typedb.database |
str |
zettelforge |
TYPEDB_DATABASE |
TypeDB database name. |
typedb.username |
str |
"" |
TYPEDB_USERNAME |
TypeDB authentication username. Supply via env var or ${TYPEDB_USERNAME} in config.yaml. |
typedb.password |
str |
"" |
TYPEDB_PASSWORD |
TypeDB authentication password. Supply via env var or ${TYPEDB_PASSWORD} in config.yaml. |
backend
| Key |
Type |
Default |
Env Override |
Description |
backend |
str |
sqlite |
ZETTELFORGE_BACKEND |
Storage backend for notes, knowledge graph, and entity index. Community uses sqlite. TypeDB is extension-only. Legacy JSONL data should be migrated to SQLite. |
embedding
@dataclass
class EmbeddingConfig:
provider: str = "fastembed"
url: str = "http://127.0.0.1:11434"
model: str = "nomic-ai/nomic-embed-text-v1.5-Q"
dimensions: int = 768
| Key |
Type |
Default |
Env Override |
Description |
embedding.provider |
str |
fastembed |
ZETTELFORGE_EMBEDDING_PROVIDER |
Embedding provider. Values: fastembed (in-process ONNX, default), ollama (requires Ollama running at embedding.url). |
embedding.url |
str |
http://127.0.0.1:11434 |
AMEM_EMBEDDING_URL |
Embedding server URL. Only used when embedding.provider is ollama. |
embedding.model |
str |
nomic-ai/nomic-embed-text-v1.5-Q |
AMEM_EMBEDDING_MODEL |
Embedding model name. Must be a fastembed-supported identifier (full HF form such as nomic-ai/nomic-embed-text-v1.5-Q for fastembed; Ollama tags like nomic-embed-text when provider=ollama). Default is nomic-ai/nomic-embed-text-v1.5-Q (768-dim, ~130 MB, ~7 ms/embed). |
embedding.dimensions |
int |
768 |
ZETTELFORGE_EMBEDDING_DIM |
Vector dimensionality. Must match the model output. If you change the embedding model, update this value and run rebuild_index.py to re-embed all notes. Common values: 768 (nomic), 1024 (mxbai), 1536 (OpenAI), 4096 (qwen3). |
llm
@dataclass
class LLMConfig:
provider: str = "ollama"
model: str = "qwen3.5:9b"
url: str = "http://localhost:11434"
api_key: str = "" # supports ${ENV_VAR} references
temperature: float = 0.1
timeout: float = 180.0 # v2.5.2 — reasoning-model headroom
max_retries: int = 2
fallback: str = "" # "" preserves implicit local->ollama fallback
local_backend: str = "llama-cpp-python" # RFC-011
max_tokens: int = 400
max_tokens_causal: int = 8000
max_tokens_synthesis: int = 2500
max_tokens_extraction: int = 2500
max_tokens_ner: int = 2500
max_tokens_evolve: int = 2500
reasoning_model: bool = False
extra: dict = field(default_factory=dict)
| Key |
Type |
Default |
Env Override |
Description |
llm.provider |
str |
ollama |
ZETTELFORGE_LLM_PROVIDER |
LLM provider name. Values shipped in core: local (in-process inference), ollama (HTTP), litellm (LiteLLM router to 100+ providers via zettelforge[litellm]), mock (tests only). Third-party providers register via the zettelforge.llm_providers entry point. |
llm.model |
str |
qwen3.5:9b |
ZETTELFORGE_LLM_MODEL |
Model identifier. Meaning is provider-specific: Ollama tag (qwen2.5:3b), HuggingFace repo for local (Qwen/Qwen2.5-3B-Instruct-GGUF), LiteLLM model name (gpt-4o, claude-sonnet-4-20250514, gemini/gemini-2.0-flash), OpenAI-compatible model name, or Anthropic model ID. |
llm.url |
str |
http://localhost:11434 |
ZETTELFORGE_LLM_URL |
Base URL. Meaning is provider-specific -- Ollama endpoint for ollama, /v1/chat/completions base for openai_compat, ignored for local and litellm. |
llm.api_key |
str |
"" |
ZETTELFORGE_LLM_API_KEY |
API key for authenticated providers (litellm, openai_compat). Accepts ${ENV_VAR} references -- never commit raw keys. Redacted from repr(LLMConfig). For litellm, you may also rely on standard environment variables (OPENAI_API_KEY, ANTHROPIC_API_KEY, etc.) instead of config. |
llm.temperature |
float |
0.1 |
-- |
Sampling temperature. 0.0 = deterministic, 0.1 = near-deterministic (default), 0.7 = creative. |
llm.timeout |
float |
180.0 |
ZETTELFORGE_LLM_TIMEOUT |
Request timeout in seconds. Applied by ollama (RFC-010) and litellm providers. Bumped 60s → 180s in v2.5.2 because reasoning-model generations at the per-call-site max_tokens budgets routinely take 60–140s on a 9B-Q4_K_M model and the old default was firing ReadTimeout mid-generation. Lower this only on faster hardware where you've measured the 99th-percentile call duration. |
llm.max_retries |
int |
2 |
ZETTELFORGE_LLM_MAX_RETRIES |
Number of retries on transient failure. Applied by litellm (via num_retries kwarg). |
llm.fallback |
str |
"" |
ZETTELFORGE_LLM_FALLBACK |
Backup provider invoked when the primary fails with a non-configuration error. Empty string preserves the implicit local -> ollama fallback for backward compatibility; set explicitly to any other registered provider to route elsewhere. |
llm.local_backend |
str |
llama-cpp-python |
ZETTELFORGE_LLM_LOCAL_BACKEND |
In-process inference engine when provider: local. Options: llama-cpp-python (GGUF, default, requires zettelforge[local]), onnxruntime-genai (ONNX, requires zettelforge[local-onnx]). Ignored for all other providers. |
llm.max_tokens |
int |
400 |
ZETTELFORGE_LLM_MAX_TOKENS |
Default generation budget when a caller does not pass an explicit max_tokens. |
llm.max_tokens_causal |
int |
8000 |
ZETTELFORGE_LLM_MAX_TOKENS_CAUSAL |
Budget for causal triple extraction. |
llm.max_tokens_synthesis |
int |
2500 |
ZETTELFORGE_LLM_MAX_TOKENS_SYNTHESIS |
Budget for RAG synthesis JSON output. |
llm.max_tokens_extraction |
int |
2500 |
ZETTELFORGE_LLM_MAX_TOKENS_EXTRACTION |
Budget for two-phase fact extraction. |
llm.max_tokens_ner |
int |
2500 |
ZETTELFORGE_LLM_MAX_TOKENS_NER |
Budget for conversational LLM NER and its retry path. |
llm.max_tokens_evolve |
int |
2500 |
ZETTELFORGE_LLM_MAX_TOKENS_EVOLVE |
Budget for memory evolution decisions and retry path. |
llm.reasoning_model |
bool |
false |
ZETTELFORGE_LLM_REASONING_MODEL |
Raises timeout and per-call-site budgets to the known-good reasoning-model floors without lowering already larger operator overrides. |
llm.extra |
dict |
{} |
-- |
Provider-specific kwargs forwarded to the constructor. String values inside extra also honour ${ENV_VAR} resolution. Common uses: {filename: qwen2.5-3b-instruct-q4_k_m.gguf, n_ctx: 4096} for local provider; {provider: rocm} for onnxruntime-genai execution provider selection; {drop_params: true} for litellm. |
Provider quick-reference
| Provider |
Install |
Config Example |
Model Format |
Notes |
local |
pip install zettelforge[local] |
provider: local + local_backend: llama-cpp-python |
HuggingFace GGUF repo ID |
In-process, fully offline. See local_backend for engine selection. |
local + onnx |
pip install zettelforge[local-onnx] |
provider: local + local_backend: onnxruntime-genai |
HuggingFace ONNX repo ID |
In-process, ROCm/DirectML/CoreML support. |
ollama |
core (no extra) |
provider: ollama + url: http://localhost:11434 |
Ollama tag (qwen3.5:9b) |
Requires ollama serve running. |
litellm |
pip install zettelforge[litellm] |
provider: litellm + model: gpt-4o |
LiteLLM model name |
Routes to 100+ providers by model prefix. |
mock |
core (no extra) |
provider: mock |
N/A |
Deterministic canned responses for testing. |
Per-call-site max_tokens budgets (v2.6.0: config-driven)
llm.timeout is config-driven. v2.6.0 moved max_tokens (Ollama num_predict) values to LLMConfig, making them config-overridable per call site. v2.5.2 raised these caps to give reasoning models (qwen3.5+, qwen3.6, nemotron-3) room to emit their hidden <think>...</think> tokens and a final answer; pre-2.5.2 caps were exhausted entirely by reasoning, leaving the JSON answer empty.
| Call site |
File |
Budget |
Why |
Causal triple extraction (note_constructor.extract_causal_triples) |
note_constructor.py |
8000 |
Highest in the codebase. Asks the model to enumerate every causal relation in a passage; reasoning chain is longest. |
Synthesis (synthesis_generator._generate_synthesis) |
synthesis_generator.py |
2500 |
Single-answer prompt; reasoning overhead is moderate. |
Fact extraction (fact_extractor.extract) |
fact_extractor.py |
2500 |
Multi-fact JSON enumeration. |
LLM NER (entity_indexer._extract_via_llm, retry path) |
entity_indexer.py |
2500 |
Conversational entity types only; regex covers CTI types separately. |
Memory evolution (memory_evolver.evolve_neighbors and retry) |
memory_evolver.py |
2500 |
Two-note comparison + decision JSON. |
Operational impact. Causal extraction at 8000 tokens runs 60–140 s per call on a 9B-Q4_K_M reasoning model; remember(sync=True) therefore blocks 1–3 minutes per note. The default async path (background enrichment queue) is unaffected — these calls happen off the write hot path. If you're triggering sync=True or doing bulk ingestion you'll feel the latency; switch to async or scale up the model server.
If you're on faster hardware or smaller non-reasoning models you can monkey-patch lower values, but the 2.5.2 defaults trade latency for correctness on the reference qwen3.5:9b setup. v2.6.0 moves these to LLMConfig so you can override per call site without touching code — see issue #125.
LiteLLM model prefix examples
LiteLLM routes requests to the correct provider based on model name prefix. The model field determines the backend automatically -- no separate config needed.
| Model Name |
Routes To |
Required Env Var |
gpt-4o, gpt-4o-mini |
OpenAI |
OPENAI_API_KEY |
claude-sonnet-4-20250514 |
Anthropic |
ANTHROPIC_API_KEY |
gemini/gemini-2.0-flash |
Google Gemini |
GOOGLE_API_KEY |
groq/llama-3.3-70b-versatile |
Groq |
GROQ_API_KEY |
together_ai/meta-llama/Llama-3.3-70B |
Together AI |
TOGETHER_API_KEY |
openrouter/anthropic/claude-3.5-sonnet |
OpenRouter |
OPENROUTER_API_KEY |
bedrock/anthropic.claude-3-sonnet-v1 |
AWS Bedrock |
AWS_ACCESS_KEY_ID + AWS_SECRET_ACCESS_KEY |
vertex_ai/claude-3-sonnet@20240229 |
Google Vertex |
GOOGLE_APPLICATION_CREDENTIALS |
llm_ner
@dataclass
class LLMNerConfig:
enabled: bool = True
| Key |
Type |
Default |
Env Override |
Description |
llm_ner.enabled |
bool |
True |
ZETTELFORGE_LLM_NER_ENABLED |
Enable always-on LLM Named Entity Recognition. When True, every remember() call enqueues a background LLM NER job that augments the fast regex-based entity extraction with conversational entities (person, location, organization, event, activity, temporal). Fast-path writes still return in ~45 ms; LLM NER runs asynchronously via the enrichment queue and merges into the note's entity set when it completes. Set False for air-gapped or benchmark runs that need deterministic regex-only extraction. |
@dataclass
class ExtractionConfig:
max_facts: int = 5
min_importance: int = 3
| Key |
Type |
Default |
Env Override |
Description |
extraction.max_facts |
int |
5 |
-- |
Maximum facts extracted per remember_with_extraction() call. |
extraction.min_importance |
int |
3 |
-- |
Facts scored below this threshold are discarded. Range: 1--10. |
retrieval
@dataclass
class RetrievalConfig:
default_k: int = 10
similarity_threshold: float = 0.25
entity_boost: float = 2.5
max_graph_depth: int = 2
| Key |
Type |
Default |
Env Override |
Description |
retrieval.default_k |
int |
10 |
-- |
Default number of results for recall(). |
retrieval.similarity_threshold |
float |
0.25 |
-- |
Minimum cosine similarity to include a vector result (0.0--1.0). Note: VectorRetriever constructor overrides this to 0.15 at runtime. |
retrieval.entity_boost |
float |
2.5 |
-- |
Multiplicative boost per overlapping entity between query and note. |
retrieval.max_graph_depth |
int |
2 |
-- |
Maximum BFS hops in the knowledge graph. |
synthesis
@dataclass
class SynthesisConfig:
max_context_tokens: int = 3000
default_format: str = "direct_answer"
tier_filter: List[str] = field(default_factory=lambda: ["A", "B"])
| Key |
Type |
Default |
Env Override |
Description |
synthesis.max_context_tokens |
int |
3000 |
-- |
Maximum tokens in the synthesis context window. |
synthesis.default_format |
str |
direct_answer |
-- |
Default synthesis output format. Values: direct_answer, synthesized_brief, timeline_analysis, relationship_map. |
synthesis.tier_filter |
List[str] |
["A", "B"] |
-- |
Epistemic tiers to include. A = authoritative, B = operational, C = support. |
governance
@dataclass
class GovernanceConfig:
enabled: bool = True
min_content_length: int = 1
pii: PIIConfig = field(default_factory=PIIConfig)
limits: LimitsConfig = field(default_factory=LimitsConfig)
memory_defense: MemoryDefenseConfig = field(default_factory=MemoryDefenseConfig)
| Key |
Type |
Default |
Env Override |
Description |
governance.enabled |
bool |
True |
-- |
Enable governance validation on remember() operations. Set False for benchmarks. |
governance.min_content_length |
int |
1 |
-- |
Minimum character length for content passed to remember(). |
governance.pii (RFC-013, optional)
@dataclass
class PIIConfig:
enabled: bool = False
action: str = "log"
redact_placeholder: str = "[REDACTED]"
entities: list[str] = field(default_factory=list)
language: str = "en"
nlp_model: str = "en_core_web_sm"
| Key |
Type |
Default |
Env Override |
Description |
governance.pii.enabled |
bool |
False |
ZETTELFORGE_PII_ENABLED |
Enable Microsoft Presidio PII detection during remember(). Soft dependency -- requires pip install zettelforge[pii] to activate. With enabled=true but the SDK missing, GovernanceValidator logs pii_validator_unavailable at WARNING and continues with _pii=None (every PII codepath becomes a no-op pass-through). |
governance.pii.action |
str |
log |
ZETTELFORGE_PII_ACTION |
Policy when PII is detected: log (warn-only, content passes through unchanged), redact (replace each finding with redact_placeholder before storage), or block (raise PIIBlockedError and refuse the write). |
governance.pii.redact_placeholder |
str |
[REDACTED] |
-- |
String substituted for each PII span when action=redact. |
governance.pii.entities |
list[str] |
[] (= all) |
-- |
Entity types to detect. Empty list = detect every Presidio-supported type. CTI allowlist (IP_ADDRESS, URL, DOMAIN_NAME) is always filtered out automatically since these are legitimate threat-intel indicators, not personal data. |
governance.pii.language |
str |
en |
-- |
Two-letter language code passed to Presidio's analyzer. |
governance.pii.nlp_model |
str |
en_core_web_sm |
-- |
spaCy NLP model name. Larger models (en_core_web_lg) catch more entities at the cost of memory and warm-up time. |
See Configure PII Detection for the full Presidio setup guide.
governance.limits (RFC-014)
@dataclass
class LimitsConfig:
max_content_length: int = 52428800
recall_timeout_seconds: float = 30.0
| Key |
Type |
Default |
Env Override |
Description |
governance.limits.max_content_length |
int |
52428800 |
ZETTELFORGE_LIMITS_MAX_CONTENT_LENGTH |
Maximum content length in bytes for remember(). 0 = unlimited. 50 MB default. |
governance.limits.recall_timeout_seconds |
float |
30.0 |
ZETTELFORGE_LIMITS_RECALL_TIMEOUT |
Maximum seconds for a recall() query. 0 = unlimited. |
governance.memory_defense (SEC-011 / MemSAD)
@dataclass
class MemoryDefenseConfig:
enabled: bool = True
mode: str = "audit"
min_calibration_notes: int = 50
max_reference_notes: int = 50
kappa: float = 2.0
lexical_weight: float = 0.25
ngram_size: int = 3
monitored_domains: list[str] = field(default_factory=list)
quarantine_path: str = ""
quarantine_raw_content: bool = True
| Key |
Type |
Default |
Env Override |
Description |
governance.memory_defense.enabled |
bool |
True |
ZETTELFORGE_MEMORY_DEFENSE_ENABLED |
Enable write-time memory-poisoning anomaly evaluation before notes are persisted or indexed. |
governance.memory_defense.mode |
str |
audit |
ZETTELFORGE_MEMORY_DEFENSE_MODE |
Policy for flagged writes: audit logs only, block rejects the write, quarantine writes a forensic JSONL record and rejects the write. |
governance.memory_defense.min_calibration_notes |
int |
50 |
ZETTELFORGE_MEMORY_DEFENSE_MIN_CALIBRATION |
Minimum same-domain reference notes required before thresholding. Below this count, writes are allowed with calibration_insufficient audit metadata. |
governance.memory_defense.max_reference_notes |
int |
50 |
-- |
Maximum recent same-domain reference notes used for calibration and scoring. |
governance.memory_defense.kappa |
float |
2.0 |
ZETTELFORGE_MEMORY_DEFENSE_KAPPA |
Threshold multiplier: mean + kappa * stddev over calibration scores. |
governance.memory_defense.lexical_weight |
float |
0.25 |
-- |
Weight applied to character n-gram Jensen-Shannon divergence, complementing embedding similarity against synonym/paraphrase evasion. |
governance.memory_defense.ngram_size |
int |
3 |
-- |
Character n-gram size for lexical divergence. |
governance.memory_defense.monitored_domains |
list[str] |
[] |
-- |
Domains to evaluate. Empty list means every domain. |
governance.memory_defense.quarantine_path |
str |
"" |
-- |
JSONL quarantine path. Empty uses <storage.data_dir>/quarantine/memory_anomalies.jsonl. |
governance.memory_defense.quarantine_raw_content |
bool |
True |
-- |
Include raw rejected content in quarantine records. Disable if quarantine storage is not approved for raw content. |
web (RFC-015)
lance (RFC-009 Phase 1.5)
@dataclass
class LanceConfig:
cleanup_interval_minutes: int = 60
cleanup_older_than_seconds: int = 3600
| Key |
Type |
Default |
Env Override |
Description |
lance.cleanup_interval_minutes |
int |
60 |
-- |
How often the LanceDB version-prune daemon wakes per shard. Set 0 to disable the daemon entirely -- use only for diagnostics or when an external compaction process owns the data dir. |
lance.cleanup_older_than_seconds |
int |
3600 |
-- |
Minimum age of a LanceDB version before it becomes prune-eligible. Lower values reclaim disk faster but increase the chance of pruning a version a concurrent reader is still using. The 3600 s (1 h) default is the conservative value validated against the 2026-04-24 Vigil incident; do not lower below 600 s without measuring concurrent-reader behaviour first. |
Operational impact. Without this daemon, every remember() write appends an immutable version row to LanceDB. Over weeks of writes the version chain grows to multi-gigabyte overhead that quintuples remember() p95 latency (Vigil 2026-04-24: 5.66 GB version-chain -> remember() p95 = 49.8 s; one-shot cleanup_old_versions shrank to 29 MB -> p95 = 250 ms). The daemon ships enabled by default -- you only need to touch this section if you've moved compaction to an out-of-process job.
web (RFC-015)
@dataclass
class WebConfig:
enabled: bool = True
host: str = "0.0.0.0"
port: int = 8088
ui_dir: str = ""
| Key |
Type |
Default |
Env Override |
Description |
web.enabled |
bool |
True |
ZETTELFORGE_WEB_ENABLED |
Enable the web management interface. Set False for library-only deployments. |
web.host |
str |
0.0.0.0 |
-- |
Bind address for the FastAPI server. |
web.port |
int |
8088 |
ZETTELFORGE_WEB_PORT |
Port for the FastAPI server. |
web.ui_dir |
str |
"" |
ZETTELFORGE_WEB_UI_DIR |
Custom path to the SPA UI directory. Empty = web/ui/ relative to project root. |
See Use the Web Management Interface for setup steps and Web API Reference for endpoint documentation.
cache
@dataclass
class CacheConfig:
ttl_seconds: int = 300
max_entries: int = 1024
| Key |
Type |
Default |
Env Override |
Description |
cache.ttl_seconds |
int |
300 |
-- |
Cache entry time-to-live in seconds. Set 0 to disable caching. |
cache.max_entries |
int |
1024 |
-- |
Maximum cache entries. Set 0 to disable caching. |
logging
@dataclass
class LoggingConfig:
level: str = "INFO"
log_intents: bool = True
log_causal: bool = True
| Key |
Type |
Default |
Env Override |
Description |
logging.level |
str |
INFO |
-- |
Minimum log level. Values: DEBUG, INFO, WARNING, ERROR. |
logging.log_intents |
bool |
True |
-- |
Log intent classification results during recall(). |
logging.log_causal |
bool |
True |
-- |
Log causal triple extraction results during remember(). |
opencti
[!NOTE]
This section is only active in ZettelForge Enterprise. It has no effect in the Community edition.
@dataclass
class OpenCTIConfig:
url: str = "http://localhost:8080"
token: str = ""
sync_interval: int = 0
| Key |
Type |
Default |
Env Override |
Description |
opencti.url |
str |
http://localhost:8080 |
OPENCTI_URL |
Base URL of the OpenCTI platform. Use https:// for cloud deployments. |
opencti.token |
str |
"" |
OPENCTI_TOKEN |
OpenCTI API token. Always set via OPENCTI_TOKEN -- never commit a token to config.yaml. |
opencti.sync_interval |
int |
0 |
OPENCTI_SYNC_INTERVAL |
Seconds between automatic pulls from OpenCTI. Set 0 to disable auto-sync and pull manually. |
Minimal opencti config.yaml block:
opencti:
url: http://localhost:8080
token: "" # Set via OPENCTI_TOKEN env var
sync_interval: 3600 # Pull every hour; 0 = manual only
Supported entity types for pull/push:
| Entity Type |
Pull |
Push |
Structured Fields |
attack_pattern |
yes |
-- |
MITRE ATT&CK ID, tactic |
intrusion_set |
yes |
-- |
Aliases, motivation, resource level |
threat_actor |
yes |
-- |
Aliases, sophistication |
malware |
yes |
-- |
Types, implementation languages, is_family |
indicator |
yes |
-- |
STIX pattern, valid_from, valid_until |
vulnerability |
yes |
-- |
CVSS v3 score/vector, EPSS score/percentile, CISA KEV |
report |
yes |
yes |
Publication date, confidence, object_refs |
All entities preserve tlp (TLP marking label: WHITE, GREEN, AMBER, or RED) and stix_confidence (STIX integer 0–100; -1 when unset in OpenCTI).
See Configure OpenCTI Integration for setup steps, pull/push examples, and troubleshooting.
Environment Variables Summary
General configuration
| Variable |
Maps To |
Example |
AMEM_DATA_DIR |
storage.data_dir |
/data/zettelforge |
TYPEDB_HOST |
typedb.host |
db.internal |
TYPEDB_PORT |
typedb.port |
1729 |
TYPEDB_DATABASE |
typedb.database |
zettelforge |
TYPEDB_USERNAME |
typedb.username |
admin |
TYPEDB_PASSWORD |
typedb.password |
s3cret |
ZETTELFORGE_BACKEND |
backend |
sqlite |
ZETTELFORGE_EMBEDDING_PROVIDER |
embedding.provider |
ollama |
AMEM_EMBEDDING_URL |
embedding.url |
http://gpu-box:11434 |
AMEM_EMBEDDING_MODEL |
embedding.model |
nomic-embed-text-v1.5-Q |
ZETTELFORGE_LLM_NER_ENABLED |
llm_ner.enabled |
true |
LLM configuration
| Variable |
Maps To |
Example |
ZETTELFORGE_LLM_PROVIDER |
llm.provider |
litellm |
ZETTELFORGE_LLM_MODEL |
llm.model |
gpt-4o |
ZETTELFORGE_LLM_URL |
llm.url |
http://gpu-box:11434 |
ZETTELFORGE_LLM_API_KEY |
llm.api_key |
sk-... |
ZETTELFORGE_LLM_TIMEOUT |
llm.timeout |
180 |
ZETTELFORGE_LLM_MAX_RETRIES |
llm.max_retries |
2 |
ZETTELFORGE_LLM_FALLBACK |
llm.fallback |
ollama |
ZETTELFORGE_LLM_LOCAL_BACKEND |
llm.local_backend |
onnxruntime-genai |
ZETTELFORGE_LLM_MAX_TOKENS |
llm.max_tokens |
400 |
ZETTELFORGE_LLM_MAX_TOKENS_CAUSAL |
llm.max_tokens_causal |
8000 |
ZETTELFORGE_LLM_MAX_TOKENS_SYNTHESIS |
llm.max_tokens_synthesis |
2500 |
ZETTELFORGE_LLM_MAX_TOKENS_EXTRACTION |
llm.max_tokens_extraction |
2500 |
ZETTELFORGE_LLM_MAX_TOKENS_NER |
llm.max_tokens_ner |
2500 |
ZETTELFORGE_LLM_MAX_TOKENS_EVOLVE |
llm.max_tokens_evolve |
2500 |
ZETTELFORGE_LLM_REASONING_MODEL |
llm.reasoning_model |
true |
Web UI configuration (RFC-015)
| Variable |
Maps To |
Example |
ZETTELFORGE_WEB_ENABLED |
web.enabled |
true |
ZETTELFORGE_WEB_PORT |
web.port |
8088 |
ZETTELFORGE_WEB_UI_DIR |
web.ui_dir |
/opt/zettelforge/ui |
Enterprise-only (OpenCTI)
| Variable |
Maps To |
Example |
OPENCTI_URL |
opencti.url |
https://opencti.corp.internal |
OPENCTI_TOKEN |
opencti.token |
abc123... |
OPENCTI_SYNC_INTERVAL |
opencti.sync_interval |
3600 |
Note: The opencti configuration section and OPENCTI_* environment-variable mapping are implemented in the Enterprise package. In Community builds, these values are ignored by src/zettelforge/config.py.
Minimal config.yaml
storage:
data_dir: ~/.amem
backend: sqlite
embedding:
provider: fastembed
model: nomic-embed-text-v1.5-Q
llm:
provider: local
model: Qwen2.5-3B-Instruct-Q4_K_M.gguf
Example Configurations by Use Case
Local in-process (fully offline, default)
llm:
provider: local
model: Qwen/Qwen2.5-3B-Instruct-GGUF
extra:
filename: qwen2.5-3b-instruct-q4_k_m.gguf
n_ctx: 4096
Local in-process with ONNX (AMD GPU)
llm:
provider: local
local_backend: onnxruntime-genai
model: microsoft/Phi-3-mini-4k-instruct-onnx
extra:
filename: phi3-mini-4k-instruct-q4.onnx
provider: rocm
Ollama (default config)
llm:
provider: ollama
model: qwen3.5:9b
url: http://localhost:11434
LiteLLM -- OpenAI
llm:
provider: litellm
model: gpt-4o
api_key: ${OPENAI_API_KEY}
LiteLLM -- Anthropic
llm:
provider: litellm
model: claude-sonnet-4-20250514
api_key: ${ANTHROPIC_API_KEY}
LiteLLM -- multiple providers via environment variables
llm:
provider: litellm
model: gpt-4o # Switch model to change provider
# API keys read from env: OPENAI_API_KEY, ANTHROPIC_API_KEY, etc.
LiteLLM -- Groq (fast inference)
llm:
provider: litellm
model: groq/llama-3.3-70b-versatile
api_key: ${GROQ_API_KEY}
LLM Quick Reference
ZettelForge configuration uses a layered resolution system: environment variables override config.yaml, which overrides config.default.yaml, which overrides hardcoded dataclass defaults. Access configuration via get_config() which returns a cached ZettelForgeConfig singleton. Call reload_config() to force a re-read.
21 environment variables are supported, covering storage (AMEM_DATA_DIR), TypeDB connection (TYPEDB_HOST, TYPEDB_PORT, TYPEDB_DATABASE, TYPEDB_USERNAME, TYPEDB_PASSWORD), backend selection (ZETTELFORGE_BACKEND), embedding provider (ZETTELFORGE_EMBEDDING_PROVIDER, AMEM_EMBEDDING_URL, AMEM_EMBEDDING_MODEL), LLM provider (ZETTELFORGE_LLM_PROVIDER, ZETTELFORGE_LLM_MODEL, ZETTELFORGE_LLM_URL, ZETTELFORGE_LLM_API_KEY, ZETTELFORGE_LLM_TIMEOUT, ZETTELFORGE_LLM_MAX_RETRIES, ZETTELFORGE_LLM_FALLBACK, ZETTELFORGE_LLM_LOCAL_BACKEND), web UI (ZETTELFORGE_WEB_ENABLED, ZETTELFORGE_WEB_PORT, ZETTELFORGE_WEB_UI_DIR), and OpenCTI integration (OPENCTI_URL, OPENCTI_TOKEN, OPENCTI_SYNC_INTERVAL).
14 config sections exist: storage (data directory), typedb (Enterprise TypeDB connection parameters), backend (community default: sqlite), embedding (vector model and server), llm (language model for extraction/synthesis with provider, model, API key, timeout, retry, fallback, local_backend, and extra), extraction (two-phase pipeline settings), retrieval (vector search tuning), synthesis (RAG output control), governance (validation toggle with pii and limits subsections), cache (query cache), logging (verbosity control), lance (LanceDB maintenance), web (web management interface host/port/ui_dir), and opencti (Enterprise only -- OpenCTI platform URL, token, and sync interval).
Key defaults: Data stored in ~/.amem. Backend is SQLite (TypeDB available via zettelforge-enterprise extension). Embedding via fastembed in-process with nomic-embed-text-v1.5-Q (768 dims, ONNX). LLM via Ollama at http://localhost:11434 with qwen3.5:9b at temperature 0.1. The local provider uses llama-cpp-python in-process with Qwen2.5-3B-Instruct-Q4_K_M.gguf. Models download automatically on first use. The litellm provider (optional, pip install zettelforge[litellm]) routes to 100+ providers by model name prefix. Extraction produces up to 5 facts with importance >= 3. Retrieval returns 10 results with 0.25 similarity threshold and 2.5x entity boost. Synthesis uses direct_answer format with A+B tier notes and 3000 token context. Cache TTL is 300 seconds with 1024 max entries. Logging at INFO level.
For air-gapped deployments: Keep backend: sqlite and use provider: local with llama-cpp-python or onnxruntime-genai. Pre-download embedding and LLM models before going offline. Legacy JSONL files are migration input, not the community default backend.
For cloud-connected deployments: Use provider: litellm with pip install zettelforge[litellm] to access OpenAI, Anthropic, Google, Groq, Together AI, AWS Bedrock, and 100+ other providers through a single interface. Set api_key in config or rely on standard environment variables.