Store threat intelligence about an actor¶

Store threat actor intelligence with remember(). ZettelForge automatically extracts entities (intrusion sets, tools, CVEs, campaigns), normalizes them, and populates the knowledge graph with inferred relationships.

Prerequisites¶

ZettelForge installed (pip install zettelforge)
Embedding and LLM models available (download automatically on first use)
TypeDB running (optional; falls back to JSONL graph)

Steps¶

1. Create a MemoryManager instance¶

from zettelforge.memory_manager import MemoryManager

mm = MemoryManager()

To use custom storage paths:

mm = MemoryManager(
    jsonl_path="/data/zettelforge/notes.jsonl",
    lance_path="/data/zettelforge/lance",
)

2. Store threat actor intelligence with `remember()`¶

content = (
    "APT28 (Fancy Bear) deployed Cobalt Strike beacons against NATO-aligned "
    "government networks in Q1 2026. The campaign exploited CVE-2024-3094, "
    "a critical backdoor in xz-utils, for initial access. Post-exploitation "
    "relied on Mimikatz for credential harvesting."
)

note, status = mm.remember(
    content=content,
    source_type="report",
    source_ref="mandiant-apt28-q1-2026",
    domain="cti",
)

print(f"Note ID: {note.id}")
print(f"Status: {status}")
print(f"Created: {note.created_at}")

The domain="cti" parameter triggers CTI-specific entity extraction and causal triple extraction (MAGMA-style) for richer graph edges.

3. Verify extracted entities¶

from zettelforge.entity_indexer import EntityExtractor

extractor = EntityExtractor()
entities = extractor.extract_all(content)

for entity_type, values in entities.items():
    if values:
        print(f"  {entity_type}: {values}")

Expected output:

  cve: ['cve-2024-3094']
  intrusion_set: ['apt28']
  tool: ['cobalt-strike', 'mimikatz']

Entity values are normalized to lowercase. Tool names are hyphenated (cobalt-strike). CVEs are stored lowercase (cve-2024-3094). APT group designators like APT28 are classified as intrusion_set, not actor. The actor type captures named actors matched by common name (Lazarus, Sandworm, Volt Typhoon).

"Fancy Bear" does not appear in the extracted output above. Regex-based extraction matches structured designators like APT28 directly. Text aliases like "Fancy Bear" resolve to their canonical form only when TypeDB is running with alias-of relations populated, or when LLM NER is enabled with extract_all(content, use_llm=True).

4. Check the knowledge graph¶

Use the correct entity type (intrusion_set for APT groups) when querying. The return value is a list of dicts, each containing a "node" sub-dict and a "relationship" key:

relationships = mm.get_entity_relationships("intrusion_set", "apt28")

for rel in relationships:
    node = rel["node"]
    print(f"  {rel['relationship']}: {node['entity_type']}:{node['entity_value']}")

Expected output (after step 2 runs):

  USES_TOOL: tool:cobalt-strike
  USES_TOOL: tool:mimikatz
  EXPLOITS_CVE: cve:cve-2024-3094
  MENTIONED_IN: note:<note_id>

5. Traverse the graph from the actor¶

traverse_graph returns a list of paths. Each path is a list of step dicts with from_type, from_value, relationship, to_type, and to_value:

paths = mm.traverse_graph(
    start_type="intrusion_set",
    start_value="apt28",
    max_depth=2,
)

for path in paths:
    for step in path:
        print(
            f"  {step['from_type']}:{step['from_value']}"
            f" --[{step['relationship']}]-->"
            f" {step['to_type']}:{step['to_value']}"
        )

If TypeDB is not running, graph traversal uses the JSONL fallback. Relationship data is identical, but query performance degrades above ~50,000 edges.

max_depth is capped at 2 in the default installation. Passing a higher value logs a warning and traversal proceeds at depth 2.

6. Store with memory evolution¶

Use evolve=True to deduplicate against existing notes. The LLM extracts facts, compares them to existing memory, and returns the operation taken per fact:

note, status = mm.remember(
    content=(
        "Lazarus Group used Cobalt Strike and a custom loader called "
        "DTrack to target cryptocurrency exchanges in March 2026. "
        "CISA advisory AA26-078A links the campaign to CVE-2024-3094."
    ),
    domain="cti",
    evolve=True,
)
print(f"[{status}] {note.id}: {note.content.raw[:80]}")
# status: "created", "updated", "corrected", or "noop"

LLM Quick Reference¶

Task: Store threat actor intelligence with automatic entity extraction and knowledge graph population.

Primary method: mm.remember(content, source_type="report", source_ref="...", domain="cti") returns (MemoryNote, str). Entities (intrusion sets, actors, tools, CVEs, campaigns, assets) are extracted automatically, values normalized, and graph edges created.

Entity extraction pipeline: Content passes through EntityExtractor().extract_all(), which applies a regex fast-path for CTI types. Each extracted value is normalized (lowercased, hyphenated) before indexing and graph storage.

Entity types for CTI content:

Entity type	Matches	Example value
`intrusion_set`	APT\d+, UNC\d+, TA\d+, FIN\d+, TEMP\d+	`apt28`
`actor`	Lazarus, Sandworm, Volt Typhoon	`lazarus`
`tool`	Cobalt Strike, Mimikatz, Metasploit, …	`cobalt-strike`
`cve`	CVE-YYYY-NNNNN	`cve-2024-3094`
`campaign`	Operation \<name>	`operation aurora`
`attack_pattern`	T1234, T1234.001	`T1566`

Graph edges created automatically: USES_TOOL (intrusion-set/actor → tool), EXPLOITS_CVE (intrusion-set/actor → cve, tool → cve), TARGETS_ASSET (intrusion-set/actor/tool → asset), CONDUCTS_CAMPAIGN (intrusion-set/actor → campaign), MENTIONED_IN (all entities → note).

get_entity_relationships return shape: list[dict] where each dict has "relationship" (str), "node" (dict with entity_type, entity_value, properties), "edge_properties" (dict), and "note_id" (str).

traverse_graph return shape: list[list[dict]] — a list of paths. Each path is a list of step dicts with from_type, from_value, relationship, to_type, to_value.

Causal triples: For domain="cti" notes or content >200 chars, LLM-based causal triple extraction runs in the background, adding richer semantic edges to the graph.

Memory evolution: mm.remember(content, domain="cti", evolve=True) extracts facts, compares each against existing notes, and returns the operation taken. For programmatic batch use, remember_with_extraction() returns a list of (MemoryNote, status) tuples.

Alias resolution: Structured designators like APT28 normalize to apt28 automatically. Common name aliases ("Fancy Bear", "Pawn Storm") resolve to the canonical designator only when TypeDB is running with alias-of relations populated, or when LLM NER is enabled. Without these, use the canonical designator directly in your content.

Key parameters: domain="cti" activates CTI entity extraction. source_type accepts "conversation", "report", "task_output". source_ref is a free-text provenance string. tlp sets a TLP marking on the note ("WHITE", "GREEN", "AMBER", "RED").