Tune LanceDB Vector Search¶
Configure LanceDB vector search for optimal retrieval quality. Adjust embedding model, index parameters, similarity threshold, and entity boost to balance precision and recall for your CTI workload.
Prerequisites¶
- ZettelForge installed (
pip install zettelforge) - Stored notes to test against (see Store Threat Actor)
Steps¶
1. Configure the embedding model¶
Edit config.yaml:
Or set via environment variables:
Supported configurations:
| Provider | Config value | Model | Dimensions | Notes |
|---|---|---|---|---|
| fastembed (default) | fastembed |
nomic-embed-text-v1.5-Q |
768 | In-process ONNX, ~130 MB, ~7ms/embed |
| Ollama (optional) | ollama |
nomic-embed-text-v2-moe:latest |
768 | Requires Ollama running on embedding.url |
| llama.cpp server | ollama |
nomic-embed-text-v2-moe.gguf |
768 | Any Ollama-compatible API endpoint |
[!WARNING] Changing the embedding model after data has been indexed requires a full re-index. Existing vectors become incompatible with new model embeddings. Run
python scripts/rebuild_index.pyafter changing models.
2. Verify embedding connectivity¶
from zettelforge.vector_memory import get_embedding
vector = get_embedding("APT28 uses Cobalt Strike for command and control")
print(f"Embedding dimensions: {len(vector)}")
print(f"First 5 values: {vector[:5]}")
Expected: Embedding dimensions: 768
3. Configure retrieval parameters¶
Edit the retrieval section of config.yaml:
Parameter reference:
| Parameter | Default | Range | Effect |
|---|---|---|---|
default_k |
10 |
1-100 | Max results returned per query |
similarity_threshold |
0.25 |
0.0-1.0 | Minimum cosine similarity to include a result |
entity_boost |
2.5 |
0.0-10.0 | Multiplicative boost per overlapping entity between query and note |
max_graph_depth |
2 |
1-5 | Hops to traverse in knowledge graph during blended retrieval |
4. Tune for high precision (fewer, more relevant results)¶
from zettelforge.memory_manager import MemoryManager
mm = MemoryManager()
notes = mm.recall("APT28 Cobalt Strike C2", domain="cti", k=5)
print(f"High-precision results: {len(notes)}")
5. Tune for high recall (cast a wide net)¶
notes = mm.recall("APT28 Cobalt Strike C2", domain="cti", k=25)
print(f"High-recall results: {len(notes)}")
[!TIP] Start with the defaults (
similarity_threshold: 0.25,entity_boost: 2.5). Lower the threshold only if relevant notes are being filtered out. Raise entity_boost if entity-specific queries return too much noise from semantically similar but entity-unrelated notes.
6. Understand IVF_PQ index settings¶
These defaults are optimal for collections up to ~1M notes. No manual tuning is needed below 10,000 notes.
[!NOTE] IVF_PQ index creation happens automatically when the LanceDB table exceeds a size threshold. You do not need to trigger index builds manually.
7. Configure the data directory¶
Or:
LanceDB stores its data at {data_dir}/lance/. The directory structure:
~/.amem/
notes.jsonl # Note metadata
lance/ # LanceDB vector index
kg_nodes.jsonl # Knowledge graph nodes
kg_edges.jsonl # Knowledge graph edges
entity_aliases.json # Local alias mappings
8. Rebuild the index after configuration changes¶
[!WARNING] Rebuilding the index re-embeds all notes. With the default fastembed provider this takes approximately 0.7 seconds per 100 notes.
LLM Quick Reference¶
Task: Configure and tune LanceDB vector search for CTI retrieval workloads.
Embedding config: embedding.provider (default fastembed, alternative ollama), embedding.model (default nomic-embed-text-v1.5-Q), embedding.dimensions (default 768). Env overrides: ZETTELFORGE_EMBEDDING_PROVIDER, AMEM_EMBEDDING_MODEL. The default fastembed provider runs in-process via ONNX with no external service required.
Retrieval config: retrieval.default_k (10), retrieval.similarity_threshold (0.25, range 0.0-1.0), retrieval.entity_boost (2.5, multiplicative per overlapping entity), retrieval.max_graph_depth (2, hops in KG traversal).
High precision preset: default_k: 5, similarity_threshold: 0.50, entity_boost: 3.0, max_graph_depth: 1.
High recall preset: default_k: 25, similarity_threshold: 0.10, entity_boost: 1.5, max_graph_depth: 3.
IVF_PQ defaults: 256 partitions, 16 sub-vectors (768 dims / 16 = 48 dims per sub-vector). Auto-created when table exceeds threshold. No manual trigger needed.
Storage: storage.data_dir (default ~/.amem). Env override: AMEM_DATA_DIR. LanceDB data at {data_dir}/lance/.
Re-index: python scripts/rebuild_index.py after changing embedding model. Required because vector dimensions/space change with model.