Ingest a threat report¶

Ingest threat reports of any length using remember_report(). ZettelForge chunks content on sentence boundaries, runs the two-phase extraction pipeline on each chunk, deduplicates against existing notes, and stores the publication date as temporal metadata so you can query by time later.

Prerequisites¶

ZettelForge installed: pip install zettelforge
A configured LLM provider. The two-phase extraction pipeline (Phase 1 fact extraction, Phase 2 update decision) requires an active LLM. Without one, remember_report() returns an empty list. See Quickstart to configure your provider.

Steps¶

1. Prepare report content¶

report_content = """
Volt Typhoon Campaign Analysis — March 2026

Executive Summary: Volt Typhoon (Bronze Silhouette) continued targeting
U.S. critical infrastructure in Q1 2026, focusing on water treatment
facilities and energy grid operators in the Pacific Northwest.

Initial access leveraged living-off-the-land binaries (LOLBins) and
compromised SOHO routers as operational relay nodes. No custom malware
was deployed; the group relied exclusively on built-in Windows tools
including PowerShell, certutil, and netsh for lateral movement.

The campaign exploited CVE-2024-3094 in xz-utils on exposed Linux
jump hosts to establish footholds in hybrid environments. CISA issued
advisory AA26-091A on March 15, 2026.

Attribution confidence: HIGH (NSA/CISA joint assessment).
Linked infrastructure overlaps with previous Volt Typhoon campaigns
tracked since May 2023.
"""

2. Ingest with `remember_report()`¶

from zettelforge.memory_manager import MemoryManager

mm = MemoryManager()

results = mm.remember_report(
    content=report_content,
    source_url="https://example.com/volt-typhoon-q1-2026",
    published_date="2026-03-20",
    domain="cti",
)

print(f"Total facts processed: {len(results)}")
for note, status in results:
    if note:
        print(f"  [{status}] {note.id}: {note.content.raw[:80]}...")

Content under 3000 characters (the default chunk_size) is processed as a single chunk. Longer reports are split on . sentence boundaries; each chunk runs independently through the extraction pipeline.

3. Inspect extraction results¶

added   = [(n, s) for n, s in results if s == "added"]
updated = [(n, s) for n, s in results if s == "updated"]
noops   = [(n, s) for n, s in results if s == "noop"]

print(f"Added: {len(added)}, Updated: {len(updated)}, No-op: {len(noops)}")

Status	Meaning
`added`	New fact stored as a new note
`updated`	Existing note updated with new information
`corrected`	Existing note corrected (factual conflict resolved)
`noop`	Fact already known; no action taken

4. Verify entities were extracted and graphed¶

relationships = mm.get_entity_relationships("actor", "volt typhoon")

for rel in relationships:
    print(f"  {rel['relationship']}: {rel['node']['entity_type']}:{rel['node']['entity_value']}")

get_entity_relationships() returns list[dict]. Each dict has:

rel['relationship'] — edge type (e.g. "uses", "targets")
rel['node']['entity_type'] — type of the related entity
rel['node']['entity_value'] — value of the related entity
rel['edge_properties'] — additional edge metadata
rel['note_id'] — source note that created this edge

Note

"Volt Typhoon" matches entity type actor via the heuristic regex in entity_indexer.py. APT-numbered groups (APT28, UNC1234) use type intrusion_set.

5. Query the ingested data¶

# Semantic recall
notes = mm.recall(
    "What infrastructure does Volt Typhoon target?",
    domain="cti",
    k=5,
)

for note in notes:
    print(f"  {note.content.raw[:120]}")

# Synthesized answer (OSS: direct_answer format)
result = mm.synthesize(
    "Summarize Volt Typhoon activity in Q1 2026",
    format="direct_answer",
    k=10,
)

print(result["synthesis"]["answer"])
print(f"Confidence: {result['synthesis']['confidence']}")

Extended synthesis formats

synthesized_brief, timeline_analysis, and relationship_map require ThreatRecall.ai SaaS. Without it, ZettelForge silently falls back to direct_answer. Use format="direct_answer" in OSS scripts to be explicit about what you will receive.

6. Adjust extraction sensitivity¶

results = mm.remember_report(
    content=report_content,
    source_url="https://example.com/report",
    published_date="2026-03-20",
    domain="cti",
    min_importance=2,   # Lower threshold — keep more facts
    max_facts=10,       # Facts per chunk (default)
    chunk_size=2000,    # Smaller chunks for denser reports
)

LLM cost per chunk

Each chunk makes LLM calls for both extraction (Phase 1) and update decisions (Phase 2). A 15,000-character report with chunk_size=3000 produces 5 chunks, each with up to max_facts LLM calls. Budget roughly 2 seconds per fact with the default in-process GGUF model.

Parameters¶

Parameter	Default	Description
`content`	required	Full report text (any length)
`source_url`	`""`	Source URL; stored as provenance per chunk (`source_url:chunk:N`)
`published_date`	`""`	ISO 8601 date; passed to the extraction LLM as temporal context
`domain`	`"cti"`	Memory domain for retrieval scoping
`min_importance`	`3`	Discard extracted facts below this importance score (1–10)
`max_facts`	`10`	Maximum facts to extract per chunk
`chunk_size`	`3000`	Maximum characters per chunk before splitting

LLM quick reference¶

Task: Ingest a long-form threat report with chunking, extraction, and deduplication.

Primary method: mm.remember_report(content, source_url="...", published_date="...", domain="cti") returns list[tuple[MemoryNote | None, str]].

Chunking: Content exceeding chunk_size chars is split on . (period-space) boundaries. Each chunk runs independently through remember_with_extraction().

Two-phase pipeline per chunk: Phase 1 (LLM) extracts salient facts scored by importance. Phase 2 (LLM) compares each fact to existing notes and returns ADD/UPDATE/DELETE/NOOP.

KG access: get_entity_relationships(entity_type, entity_value) returns list[dict]; each dict has node (entity_type, entity_value), relationship, edge_properties, note_id.

Synthesis (OSS): mm.synthesize(query, format="direct_answer") returns result["synthesis"]["answer"], result["synthesis"]["confidence"], result["synthesis"]["sources"]. Extended formats require ThreatRecall.ai SaaS.