Ingest your first CTI report¶

What you will build: A fully indexed threat intelligence report with extracted STIX entities (threat actor, malware, vulnerability), queryable graph relationships, and a synthesized threat brief about APT28.

Time estimate: 15 minutes

Prerequisites:

You have completed Quickstart (Tutorial 01).
ZettelForge 2.7.0 is installed (pip install zettelforge).
An LLM provider is configured and running. Steps 3, 4, and 8 require LLM inference for fact extraction and synthesis. The default provider is local (Qwen2.5-3B-Instruct GGUF via llama-cpp-python, downloaded automatically). To use Ollama instead, set ZETTELFORGE_LLM_PROVIDER=ollama and run ollama pull qwen2.5:3b.
ZettelForge uses SQLite by default. No external database required.

Step 1: Start a Python session¶

Open a terminal and start a Python REPL.

python3

Import ZettelForge and create a MemoryManager instance.

from zettelforge import MemoryManager

mm = MemoryManager()

On the default SQLite backend, the manager initializes with no connection message. The storage backend is SQLiteBackend.

Note

TypeDB support for full STIX 2.1 entity types and deeper graph traversal is available on ThreatRecall.ai SaaS. Set ZETTELFORGE_BACKEND=typedb with a running TypeDB instance on localhost:1729. This tutorial works with the default SQLite backend.

Step 2: Create a sample threat report¶

Define a threat intelligence report as a Python string. This report describes APT28 using Cobalt Strike to exploit CVE-2024-1111 against the energy sector.

report = """
THREAT INTELLIGENCE REPORT: APT28 Campaign Targeting Energy Sector
Published: 2026-03-15
TLP:AMBER

Russian state-sponsored threat actor APT28 (also known as Fancy Bear) has been
observed conducting a sustained cyber espionage campaign against energy sector
organizations in Western Europe. The campaign, active since January 2026, leverages
Cobalt Strike beacons delivered through spear-phishing emails containing weaponized
PDF attachments.

Technical Analysis:
The initial access vector exploits CVE-2024-1111, a critical remote code execution
vulnerability in PDF rendering libraries (CVSS 9.8). Upon successful exploitation,
the payload deploys a Cobalt Strike beacon configured to communicate with C2
infrastructure hosted on compromised legitimate websites.

APT28 operators use Cobalt Strike's built-in lateral movement capabilities to
pivot through victim networks, targeting operational technology (OT) network
segments connected to SCADA systems. The threat actor has been observed
exfiltrating engineering schematics and network topology documents from
compromised energy utilities.

Indicators of Compromise:
- C2 Domain: update-service.energy-grid[.]com
- Cobalt Strike Beacon Hash: a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4
- Exploit Payload Hash: 9f8e7d6c5b4a9f8e7d6c5b4a9f8e7d6c
- Spear-phishing Subject: "Q1 2026 Energy Market Compliance Update"

MITRE ATT&CK Mapping:
- T1566.001 - Spear-phishing Attachment
- T1203 - Exploitation for Client Execution (CVE-2024-1111)
- T1071.001 - Web Protocols (Cobalt Strike C2)
- T1021.002 - SMB/Windows Admin Shares (Lateral Movement)

Recommendations:
Patch CVE-2024-1111 immediately. Block the listed IOCs at network perimeter.
Monitor for anomalous SMB traffic between IT and OT network segments.
"""

Step 3: Ingest the report with `remember_report()`¶

Feed the report into ZettelForge. The remember_report() method chunks the text, runs two-phase LLM extraction on each chunk, and stores the results with CTI domain metadata.

Note

This step requires a configured LLM provider. If the LLM is unavailable, the method returns an empty list and logs a warning. Start Ollama (ollama serve) or configure the local GGUF provider before continuing.

results = mm.remember_report(
    content=report,
    source_url="https://intel.example.com/reports/apt28-energy-2026",
    published_date="2026-03-15",
    domain="cti",
    chunk_size=3000
)

Representative output (exact log lines depend on your LLM configuration):

[Extraction] Phase 1: Extracted 5 facts from chunk 0
[Extraction] Phase 2: ADD "APT28 is conducting cyber espionage against energy sector" (importance: 8)
[Extraction] Phase 2: ADD "APT28 uses Cobalt Strike beacons via spear-phishing PDFs" (importance: 9)
[Extraction] Phase 2: ADD "CVE-2024-1111 is a critical RCE in PDF rendering (CVSS 9.8)" (importance: 9)
[Extraction] Phase 2: ADD "APT28 targets OT/SCADA systems in energy utilities" (importance: 8)
[Extraction] Phase 2: ADD "C2 domain: update-service.energy-grid.com" (importance: 7)
[Causal] Extracted 3 triples, stored 3 edges for note note_20260409_...

Tip

The output shows both phases in action. Phase 1 (extraction) pulls candidate facts and scores them by importance (1-10). Phase 2 (update) compares each fact against existing memory and decides ADD, UPDATE, or NOOP.

Step 4: Inspect the extraction results¶

Print the results to see what ZettelForge created.

print(f"Total memory operations: {len(results)}\n")

for note, status in results:
    if note:
        print(f"[{status.upper()}] {note.id}")
        print(f"  Content:    {note.content.raw[:80]}...")
        print(f"  Domain:     {note.metadata.domain}")
        print(f"  Tier:       {note.metadata.tier}")
        print(f"  Importance: {note.metadata.importance}")
        print(f"  Keywords:   {note.semantic.keywords}")
        print(f"  Entities:   {note.semantic.entities}")
        print()

Representative output:

Total memory operations: 5

[ADDED] note_20260409_143201_a8f2
  Content:    APT28 is conducting a sustained cyber espionage campaign against energy sector o...
  Domain:     cti
  Tier:       B
  Importance: 8
  Keywords:   ['apt28', 'energy sector', 'cyber espionage', 'western europe']
  Entities:   ['APT28', 'energy sector']

[ADDED] note_20260409_143202_b3c7
  Content:    APT28 uses Cobalt Strike beacons delivered through spear-phishing emails with we...
  Domain:     cti
  Tier:       B
  Importance: 9
  Keywords:   ['apt28', 'cobalt strike', 'spear-phishing', 'beacon']
  Entities:   ['APT28', 'Cobalt Strike']

[ADDED] note_20260409_143203_d1e4
  Content:    CVE-2024-1111 is a critical remote code execution vulnerability in PDF rendering...
  Domain:     cti
  Tier:       B
  Importance: 9
  Keywords:   ['cve-2024-1111', 'rce', 'pdf rendering', 'cvss 9.8']
  Entities:   ['CVE-2024-1111']

[ADDED] note_20260409_143204_f5a9
  Content:    APT28 targets operational technology and SCADA systems in compromised energy util...
  Domain:     cti
  Tier:       B
  Importance: 8
  Keywords:   ['apt28', 'scada', 'ot', 'lateral movement']
  Entities:   ['APT28', 'SCADA']

[ADDED] note_20260409_143205_c2b8
  Content:    C2 domain update-service.energy-grid.com used by APT28 Cobalt Strike beacons...
  Domain:     cti
  Tier:       B
  Importance: 7
  Keywords:   ['c2', 'cobalt strike', 'ioc', 'domain']
  Entities:   ['APT28', 'Cobalt Strike']

Note

Your note IDs and timestamps will differ. The number of operations depends on how the LLM segments the facts. Expect between 4 and 6 ADDED operations.

Step 5: Query entity relationships from the knowledge graph¶

Check that ZettelForge built STIX entity relationships during ingestion. Use mm.get_entity_relationships() to query direct outgoing edges from an entity in the SQLite knowledge graph.

Note

APT-numbered groups (APT28, APT29, UNC series, FIN series) are indexed with entity_type="intrusion_set" by ZettelForge's entity extractor. Use "intrusion_set" as the entity type when querying them — "actor" returns a different (smaller) category.

actor_rels = mm.get_entity_relationships("intrusion_set", "apt28")
print("=== APT28 Relationships ===")
for rel in actor_rels:
    node = rel["node"]
    print(f"  APT28 --[{rel['relationship']}]--> {node['entity_type']}:{node['entity_value']}")

print()

tool_rels = mm.get_entity_relationships("tool", "cobalt-strike")
print("=== Cobalt Strike Relationships ===")
for rel in tool_rels:
    node = rel["node"]
    print(f"  Cobalt Strike --[{rel['relationship']}]--> {node['entity_type']}:{node['entity_value']}")

Expected output (shape verified from source; exact edges depend on LLM extraction results):

=== APT28 Relationships ===
  APT28 --[USES_TOOL]--> tool:cobalt-strike
  APT28 --[EXPLOITS_CVE]--> cve:CVE-2024-1111
  APT28 --[TARGETS_ASSET]--> asset:energy sector
  APT28 --[MENTIONED_IN]--> note:note_20260409_143201_a8f2
  APT28 --[MENTIONED_IN]--> note:note_20260409_143202_b3c7

=== Cobalt Strike Relationships ===
  Cobalt Strike --[EXPLOITS_CVE]--> cve:CVE-2024-1111
  Cobalt Strike --[MENTIONED_IN]--> note:note_20260409_143202_b3c7

Note

ZettelForge creates USES_TOOL, EXPLOITS_CVE, and TARGETS_ASSET edges from entity co-occurrence in the report text. The MENTIONED_IN edges link entities to the notes that mention them.

Step 6: Use `recall()` to find the ingested intel¶

Search memory using natural language. The recall() method blends vector similarity search with graph traversal.

results = mm.recall("APT28 cobalt strike energy sector", domain="cti", k=5)

print(f"Found {len(results)} relevant notes:\n")
for note in results:
    print(f"  [{note.id}] (importance={note.metadata.importance})")
    print(f"    {note.content.raw[:100]}...")
    print()

Expected output:

Found 5 relevant notes:

  [note_20260409_143202_b3c7] (importance=9)
    APT28 uses Cobalt Strike beacons delivered through spear-phishing emails with we...

  [note_20260409_143201_a8f2] (importance=8)
    APT28 is conducting a sustained cyber espionage campaign against energy sector o...

  [note_20260409_143203_d1e4] (importance=9)
    CVE-2024-1111 is a critical remote code execution vulnerability in PDF rendering...

  [note_20260409_143204_f5a9] (importance=8)
    APT28 targets operational technology and SCADA systems in compromised energy util...

  [note_20260409_143205_c2b8] (importance=7)
    C2 domain update-service.energy-grid.com used by APT28 Cobalt Strike beacons...

Step 7: Walk the relationship chain with `traverse_graph()`¶

Traverse the knowledge graph starting from APT28 to discover the full attack chain.

paths = mm.traverse_graph(start_type="intrusion_set", start_value="apt28", max_depth=2)

print(f"Graph traversal found {len(paths)} paths:\n")
for i, path in enumerate(paths):
    chain = []
    for step in path:
        if not chain:
            chain.append(f"{step['from_type']}:{step['from_value']}")
        chain.append(f"--[{step['relationship']}]--> {step['to_type']}:{step['to_value']}")
    print(f"  Path {i+1}: {' '.join(chain)}")

Expected output (paths vary with stored data):

Graph traversal found 6 paths:

  Path 1: intrusion_set:apt28 --[USES_TOOL]--> tool:cobalt-strike
  Path 2: intrusion_set:apt28 --[USES_TOOL]--> tool:cobalt-strike --[EXPLOITS_CVE]--> cve:CVE-2024-1111
  Path 3: intrusion_set:apt28 --[EXPLOITS_CVE]--> cve:CVE-2024-1111
  Path 4: intrusion_set:apt28 --[TARGETS_ASSET]--> asset:energy sector
  Path 5: intrusion_set:apt28 --[MENTIONED_IN]--> note:note_20260409_143201_a8f2
  Path 6: intrusion_set:apt28 --[MENTIONED_IN]--> note:note_20260409_143202_b3c7

Path 2 shows the complete attack chain from APT28 through Cobalt Strike to CVE-2024-1111.

Step 8: Synthesize a threat brief¶

Use synthesize() to generate a direct-answer brief about APT28 from all ingested memory. The synthesis engine retrieves relevant notes, builds context, and produces a structured answer through the LLM.

Note

This step requires a configured LLM. Without an LLM, synthesis returns a confirmed fallback response: synthesis["answer"] = "No specific answer found...", synthesis["confidence"] = 0.0, synthesis["sources"] = []. Retrieval still runs (sources are found and counted) but the LLM answer is empty. The synthesized_brief format (structured themes and summary) is available on ThreatRecall.ai SaaS. This tutorial uses direct_answer, which is available in ZettelForge OSS.

result = mm.synthesize(
    query="What do we know about APT28?",
    format="direct_answer",
    k=10
)

synthesis = result["synthesis"]
meta = result["metadata"]

print("=== APT28 Direct Answer ===\n")
print(synthesis["answer"])
print(f"\nConfidence: {synthesis['confidence']}")
print(f"\nSources used: {meta['sources_count']}")
print(f"Model: {meta['model_used']}")
print(f"Latency: {meta['latency_ms']}ms")

Expected output (requires LLM):

=== APT28 Direct Answer ===

APT28 (Fancy Bear) is a Russian state-sponsored threat actor conducting cyber
espionage against Western European energy sector organizations since January 2026.
The campaign uses Cobalt Strike beacons via spear-phishing PDFs exploiting
CVE-2024-1111 (CVSS 9.8), targeting OT/SCADA network segments.

Confidence: 0.85

Sources used: 5
Model: qwen2.5:3b
Latency: 1240ms

What you built¶

You ingested a raw threat intelligence report and turned it into structured, queryable knowledge:

Chunked ingestion -- remember_report() split the report and ran two-phase LLM extraction on each chunk
Fact extraction -- The LLM identified 5 high-importance facts and scored them
STIX entity creation -- The knowledge graph now has actor:apt28, tool:cobalt-strike, cve:CVE-2024-1111, and asset:energy sector with inferred relationships
Relationship query -- get_entity_relationships() retrieves direct KG edges for any entity
Graph traversal -- traverse_graph() walks the relationship chain to map the full attack path
Semantic recall -- recall() blends vector similarity and graph traversal for ranked retrieval
Synthesis -- synthesize() generates an LLM-backed answer grounded in your stored intelligence

Next steps¶

LLM quick reference¶

This section is for LLM agents consuming ZettelForge's CTI ingestion pipeline programmatically.

Ingest a report -- mm.remember_report(content, source_url, published_date, domain="cti", chunk_size=3000). Returns List[Tuple[Optional[MemoryNote], str]] where each tuple is (note, status) and status is one of "added", "updated", "corrected", "noop". Chunks on sentence boundaries, runs Phase 1 fact extraction (LLM scores importance 1-10, filters by min_importance=3), then Phase 2 update decisions (ADD/UPDATE/DELETE/NOOP against existing memory). Default max_facts=10 per chunk. Requires LLM.

Recall intel -- mm.recall(query, domain="cti", k=10). Returns List[MemoryNote] ranked by blended vector + graph score. Superseded notes excluded by default. For entity-specific lookup: mm.recall_entity(entity_type, entity_value) where entity_type is one of cve, actor, tool, campaign, sector. Shortcuts: mm.recall_cve("CVE-2024-1111"), mm.recall_actor("apt28"), mm.recall_tool("cobalt-strike").

Query entity relationships -- mm.get_entity_relationships(entity_type, entity_value). Returns List[Dict] where each item has node (dict with entity_type, entity_value), relationship (str), edge_properties (dict), and note_id (str). Writes to the SQLite KG during remember() and remember_report().

Traverse the graph -- mm.traverse_graph(start_type, start_value, max_depth=2). Returns List[List[Dict]] -- a list of paths, each path a list of steps. Each step has from_type, from_value, relationship, to_type, to_value. Depth capped at 2 in OSS. Valid start_type values: intrusion_set, actor, tool, cve, campaign, asset, note. APT-numbered groups (APT28, UNC series) are indexed as intrusion_set, not actor. Relationships: USES_TOOL, EXPLOITS_CVE, TARGETS_ASSET, CONDUCTS_CAMPAIGN, MENTIONED_IN, SUPERSEDES.

Synthesize a brief -- mm.synthesize(query, format="direct_answer", k=10). Returns dict with top-level keys query, format, synthesis, metadata, sources. For "direct_answer": synthesis["answer"] (str), synthesis["confidence"] (float 0-1), synthesis["sources"] (list). The "synthesized_brief", "timeline_analysis", and "relationship_map" formats are available on ThreatRecall.ai SaaS; OSS falls back to "direct_answer". Requires LLM for non-fallback output.

MemoryNote fields -- id (str), content.raw (str), semantic.keywords (list), semantic.entities (list), metadata.domain (str), metadata.tier (str, A/B/C), metadata.importance (int 1-10), metadata.confidence (float), links.superseded_by (optional str), links.supersedes (list).

Configuration -- Default LLM provider is local (Qwen2.5-3B-Instruct Q4_K_M GGUF via llama-cpp-python, in-process, downloads automatically). Default embedding is nomic-embed-text-v1.5-Q via fastembed (in-process). For Ollama: set ZETTELFORGE_LLM_PROVIDER=ollama and ZETTELFORGE_LLM_MODEL=qwen2.5:3b. Environment overrides: ZETTELFORGE_LLM_MODEL, ZETTELFORGE_LLM_PROVIDER, ZETTELFORGE_LLM_URL, TYPEDB_HOST, TYPEDB_PORT, TYPEDB_DATABASE.

Ingest your first CTI report¶

Step 1: Start a Python session¶

Step 2: Create a sample threat report¶

Step 3: Ingest the report with remember_report()¶

Step 4: Inspect the extraction results¶

Step 5: Query entity relationships from the knowledge graph¶

Step 6: Use recall() to find the ingested intel¶

Step 7: Walk the relationship chain with traverse_graph()¶

Step 8: Synthesize a threat brief¶

What you built¶

Next steps¶

LLM quick reference¶

Step 3: Ingest the report with `remember_report()`¶

Step 6: Use `recall()` to find the ingested intel¶

Step 7: Walk the relationship chain with `traverse_graph()`¶