Skip to content

Integrate with LangChain

Use ZettelForge as a LangChain-compatible retriever in any RAG pipeline. The ZettelForgeRetriever wraps MemoryManager.recall() and converts ZettelForge MemoryNote objects into LangChain Document objects with rich metadata.

Prerequisites

  • ZettelForge installed (pip install zettelforge)
  • LangChain installed: pip install langchain-core langchain-community
  • Embedding and LLM models available (download automatically on first use)

Steps

1. Create a MemoryManager and seed it

from zettelforge import MemoryManager

mm = MemoryManager()

# Seed with CTI-relevant memories
mm.remember(
    "APT28 (Fancy Bear) uses spear-phishing emails with credential-harvesting links. "
    "They have been observed using domains mimicking NATO and defense contractors.",
    domain="security_ops",
)
mm.remember(
    "CVE-2024-3094: XZ Utils backdoor in versions 5.6.0 and 5.6.1. "
    "CVSS score 10.0. Supply chain attack affecting SSH authentication.",
    domain="security_ops",
)

2. Create the retriever

from zettelforge.integrations.langchain_retriever import ZettelForgeRetriever

retriever = ZettelForgeRetriever(
    memory_manager=mm,
    k=5,                     # Number of documents to return
)

3. Use the retriever directly

docs = retriever.invoke("What techniques does APT28 use?")

for doc in docs:
    print(f"Note ID: {doc.metadata['note_id']}")
    print(f"Domain: {doc.metadata['domain']}")
    print(f"Content: {doc.page_content[:100]}...")
    print("---")

4. Use the retriever in a LangChain RAG chain

from langchain.chains import RetrievalQA
from langchain_ollama import ChatOllama

llm = ChatOllama(model="qwen3.5:9b", temperature=0.1)

qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=retriever,
)

answer = qa_chain.invoke("What is known about APT28's spear-phishing?")
print(answer["result"])

5. Configure retriever parameters

# Filter by domain
retriever = ZettelForgeRetriever(
    memory_manager=mm,
    k=10,
    domain="security_ops",    # Only search notes in this domain
)

# Control graph-linked notes and superseded filtering
retriever = ZettelForgeRetriever(
    memory_manager=mm,
    k=10,
    include_links=True,       # Include graph-linked notes (default True)
    exclude_superseded=True,  # Exclude superseded notes (default True)
)

6. Use as part of a conversational RAG pipeline

from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationalRetrievalChain

memory = ConversationBufferMemory(
    memory_key="chat_history",
    return_messages=True,
)

conversational_chain = ConversationalRetrievalChain.from_llm(
    llm=llm,
    retriever=retriever,
    memory=memory,
)

result = conversational_chain.invoke({
    "question": "What CVEs were in the XZ Utils incident?"
})
print(result["answer"])

7. Understand the metadata fields

Each LangChain Document carries the following metadata dictionary:

Field Type Description
note_id str ZettelForge internal note ID
source_type str Source type (e.g., report, conversation, sigma_rule)
source_ref str Source reference string
context str Semantic context summary
keywords list[str] Extracted keywords
tags list[str] Semantic tags
entities list[str] Extracted entities
domain str Memory domain
tier str Epistemic tier (A, B, C)
confidence float Note confidence score
importance int Importance rating (1-10)
created_at str ISO 8601 creation timestamp
updated_at str ISO 8601 update timestamp
cvss_v3_score float \| None CVSS v3 score (CVE notes only)
cisa_kev bool \| None CISA KEV status (CVE notes only)

8. Use with a custom prompt template

from langchain.prompts import PromptTemplate

template = """
You are a CTI analyst reviewing intelligence from past investigations.
Use the following pieces of context to answer the question.
If you don't know the answer, say so — do not fabricate information.

Context:
{context}

Question: {question}

Answer with specific entities, techniques, and confidence levels where possible:
"""

prompt = PromptTemplate(
    template=template,
    input_variables=["context", "question"],
)

qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=retriever,
    chain_type_kwargs={"prompt": prompt},
)

LLM Quick Reference

Task: Integrate ZettelForge memory as a LangChain retriever.

Class: ZettelForgeRetriever from zettelforge.integrations.langchain_retriever

Constructor parameters: - memory_manager: MemoryManager — required ZettelForge instance - k: int = 10 — number of documents to return - domain: str | None = None — filter by memory domain - include_links: bool = True — include graph-linked notes - exclude_superseded: bool = True — exclude superseded notes

Returns: list[Document] where each Document.page_content is the raw note content and Document.metadata contains all ZettelForge metadata fields.

Blended search: The retriever uses MemoryManager.recall() which performs blended vector similarity search and knowledge graph traversal, then ranks results by combined relevance.

CVE metadata: Notes with vulnerability data include cvss_v3_score and cisa_kev in metadata fields automatically.

Standard LangChain: Implements BaseRetriever from langchain_core so it works with RetrievalQA, ConversationalRetrievalChain, create_retrieval_chain, and any other LangChain API that accepts a retriever.

Serialization: The retriever uses Pydantic ConfigDict(arbitrary_types_allowed=True) so the MemoryManager field is accepted as an arbitrary type without validation errors.