KG edge schema reference¶

Module: zettelforge.knowledge_graph

from zettelforge.knowledge_graph import KnowledgeGraph, get_knowledge_graph

Overview¶

ZettelForge maintains a knowledge graph of entities and typed relationships extracted from analyst notes and detection rules. The graph stores entity nodes and directed relationship edges with optional temporal indexing for time-based queries.

Two storage paths exist in parallel:

SQLite (default). When you use MemoryManager with the default backend, KG writes go to the kg_nodes and kg_edges tables in zettelforge.db. This is the primary path for most deployments.
JSONL in-process. The KnowledgeGraph class (below) writes to kg_nodes.jsonl and kg_edges.jsonl in the data directory. It is used when you call get_knowledge_graph() directly or when the TypeDB extension is not installed and the backend is explicitly set to something other than SQLite.

The edge field names and relationship types are identical in both paths. The SQLite edge table adds a note_id column (see below).

On ThreatRecall.ai SaaS, the graph backend is TypeDB, which provides a richer semantic layer. All relationship types documented here apply across backends.

Thread safety: All write operations in KnowledgeGraph use threading.RLock.

Node schema¶

Nodes are created automatically when edges reference them. The entity_type field identifies the domain; the entity_value holds the canonical string.

Node fields¶

Field	Type	Description
`node_id`	`str`	`node_<12-hex>` — internal primary key
`entity_type`	`str`	Entity type string (see taxonomy below)
`entity_value`	`str`	Canonical value, e.g. `apt28`, `CVE-2024-3094`
`properties`	`dict`	Application-specific metadata
`created_at`	`str`	ISO 8601
`updated_at`	`str`	ISO 8601

Entity type taxonomy¶

Entity types written by each subsystem. Strings are case-sensitive and written exactly as shown.

CTI domain (written by MemoryManager._update_knowledge_graph()):

`entity_type`	Description
`note`	MemoryNote stored via `remember()`
`actor`	Threat actor via regex extraction
`threat_actor`	Threat actor via LLM NER
`intrusion_set`	Intrusion set (e.g. APT group)
`tool`	Malware or tool name
`cve`	Vulnerability identifier
`asset`	Target asset or sector
`campaign`	Campaign name
`attack_pattern`	MITRE ATT&CK technique (e.g. `T1059`)
`malware`	Malware family name
`person`	Person name (conversational domain)
`location`	Location (conversational domain)
`organization`	Organization name
`event`	Named event
`activity`	Activity name
`temporal`	Temporal reference extracted from text

Sigma and YARA domain (written by sigma/YARA ingest):

`entity_type`	Description
`SigmaRule`	Sigma detection rule
`YaraRule`	YARA detection rule
`SigmaTag`	Raw Sigma tag string
`YaraTag`	Raw YARA tag string
`LogSource`	Sigma logsource facet
`AttackPattern`	MITRE ATT&CK technique resolved from tag
`Vulnerability`	CVE resolved from tag
`IntrusionSet`	MITRE ATT&CK group resolved from sigma tag
`Malware`	MITRE ATT&CK software resolved from sigma tag
`ThreatActor`	Named threat actor resolved from YARA metadata

SQLite DDL¶

CREATE TABLE IF NOT EXISTS kg_nodes (
    node_id       TEXT PRIMARY KEY,
    entity_type   TEXT NOT NULL,
    entity_value  TEXT NOT NULL,
    properties    TEXT DEFAULT '{}',
    created_at    TEXT,
    updated_at    TEXT,
    UNIQUE(entity_type, entity_value)
);

Edge schema¶

Each edge is a directed relationship between two nodes.

Edge fields¶

Field	Type	Description
`edge_id`	`str`	`edge_<12-hex>` — internal primary key
`from_node_id`	`str`	Source node `node_id`
`to_node_id`	`str`	Target node `node_id`
`relationship`	`str`	Semantic relationship type (see below)
`edge_type`	`str`	`heuristic` \| `causal` \| `detection` (default: `heuristic`)
`properties`	`dict`	Edge-specific metadata
`created_at`	`str`	ISO 8601
`updated_at`	`str`	ISO 8601

SQLite DDL¶

The SQLite path adds note_id, which links an edge back to the note that caused it. The unique constraint includes note_id, so the same entity pair can have the same relationship recorded from multiple notes.

CREATE TABLE IF NOT EXISTS kg_edges (
    edge_id       TEXT PRIMARY KEY,
    from_node_id  TEXT NOT NULL,
    to_node_id    TEXT NOT NULL,
    relationship  TEXT NOT NULL,
    edge_type     TEXT DEFAULT 'heuristic',
    note_id       TEXT DEFAULT '',
    properties    TEXT DEFAULT '{}',
    created_at    TEXT,
    updated_at    TEXT,
    UNIQUE(from_node_id, to_node_id, relationship, note_id)
);

The JSONL path deduplicates on (from_node_id, to_node_id, relationship) without note_id.

Edge type taxonomy¶

The edge_type field records how the edge was created:

`edge_type`	Description	Source
`heuristic`	Co-occurrence or heuristic extraction (default)	Entity co-occurrence during `remember()`
`causal`	LLM-extracted cause-and-effect triple	Causal triple extraction in slow-path enrichment
`detection`	Detection rule relationship	Sigma/YARA ingest

When add_edge() is called on an existing heuristic edge with a more specific edge_type, the type is promoted to the more specific value.

Relationship types¶

CTI entity relationships¶

Created automatically during remember() with domain="cti".

Relationship	From	To	Description
`MENTIONED_IN`	any entity	`note`	Entity appeared in a note
`USES_TOOL`	`actor`, `threat_actor`, `intrusion_set`	`tool`	Actor uses a specific tool
`EXPLOITS_CVE`	`actor`, `threat_actor`, `intrusion_set`, `tool`	`cve`	Entity exploits a vulnerability
`TARGETS_ASSET`	`actor`, `threat_actor`, `intrusion_set`, `tool`	`asset`	Entity targets an asset or sector
`CONDUCTS_CAMPAIGN`	`actor`, `threat_actor`, `intrusion_set`	`campaign`	Actor runs a campaign
`USES_TECHNIQUE`	`actor`, `threat_actor`, `intrusion_set`	`attack_pattern`	Actor uses an ATT&CK technique
`IMPLEMENTS`	`malware`	`attack_pattern`	Malware implements a technique

Conversational entity relationships (extracted when domain is not restricted to CTI):

Relationship	From	To	Description
`AFFILIATED_WITH`	`person`	`organization`	Person affiliated with an org
`ATTENDED`	`person`	`event`	Person attended an event
`LOCATED_AT`	`person`	`location`	Person located at a place
`PARTICIPATES_IN`	`person`	`activity`	Person participates in an activity
`HELD_AT`	`event`	`location`	Event held at a location
`ORGANIZED_BY`	`event`	`organization`	Event organized by an org
`OCCURRED_ON`	`event`	`temporal`	Event at a temporal reference
`BASED_IN`	`organization`	`location`	Organization based in a location

Note versioning:

Relationship	From	To	Description
`SUPERSEDES`	`note`	`note`	Newer note version supersedes the older one

LLM causal relationships¶

Created by NoteConstructor.store_causal_edges() during slow-path enrichment. These edges have edge_type="causal". Extraction requires a configured LLM provider.

Relationship	Description
`causes`	Subject is a direct cause of object
`enables`	Subject enables or facilitates object
`targets`	Subject targets object
`uses`	Subject uses object
`exploits`	Subject exploits object
`attributed_to`	Subject is attributed to object
`related_to`	Generic causal relationship

The relationship value is the lowercase string as validated against CAUSAL_RELATIONS (note_constructor.py:98). Node entity_type and entity_value for causal edges are free-form strings extracted by the LLM and are not constrained to the entity type taxonomy above.

Detection rule relationships¶

Created during Sigma and YARA ingest.

Relationship	From	To	Description
`applies_to`	`SigmaRule`	`LogSource`	Rule applies to a log source facet
`tagged_with`	`SigmaRule`, `YaraRule`	`SigmaTag`, `YaraTag`	Rule carries a tag
`detects`	`SigmaRule`, `YaraRule`	`AttackPattern`	Rule detects an ATT&CK technique
`references_cve`	`SigmaRule`, `YaraRule`	`Vulnerability`	Rule references a CVE
`attributed_to`	`SigmaRule`, `YaraRule`	`IntrusionSet`, `Malware`, `ThreatActor`	Rule attributed to a group or actor
`superseded_by`	`SigmaRule`	`SigmaRule`	Rule replaced by a newer version
`related_to`	`SigmaRule`	`SigmaRule`	Generic rule relationship

Temporal relationships¶

For tracking entity state over time. Used with add_temporal_edge().

Relationship	Description
`TEMPORAL_BEFORE`	State or event occurred before another
`TEMPORAL_AFTER`	State or event occurred after another
`SUPERSEDES`	New entity state supersedes an old one

Temporal edges are indexed automatically in _temporal_index and _entity_timeline on write.

Legacy schema normalization¶

Edges written by pre-v2.5.1 deployments used different key names. _normalize_edge_schema() rewrites them on load:

Legacy key	Canonical key
`source_id`	`from_node_id`
`target_id`	`to_node_id`
`relation_type`	`relationship`

Rules for rejected entries:

Entry missing edge_id: dropped.
Entry missing from_node_id, to_node_id, or relationship (and no recoverable legacy keys): dropped.
Malformed JSON line: skipped; total count logged at warning level as kg_edges_skipped_malformed.

This was a production hotfix (v2.5.1) for long-running deployments where pre-v2.5.x writers had written ~80k+ legacy entries alongside canonical-shape rows in kg_edges.jsonl.

KnowledgeGraph class API¶

Use KnowledgeGraph directly when you need in-process JSONL graph access, or call get_knowledge_graph() for the shared singleton.

add_node()¶

def add_node(self, entity_type: str, entity_value: str, properties: dict | None = None) -> str

Creates or updates a node. Returns node_id. If the node already exists, merges properties and refreshes updated_at.

add_edge()¶

def add_edge(
    self,
    from_type: str,
    from_value: str,
    to_type: str,
    to_value: str,
    relationship: str,
    properties: dict | None = None,
) -> str

Creates or updates a directed edge. Auto-creates both endpoint nodes. Deduplicates on (from_node_id, to_node_id, relationship). If a duplicate exists with edge_type="heuristic" and you pass a more specific edge_type in properties, it is promoted.

get_node()¶

def get_node(self, entity_type: str, entity_value: str) -> dict | None

Looks up a node by type and value. Returns None if not found.

get_node_by_id()¶

def get_node_by_id(self, node_id: str) -> dict | None

Looks up a node by its internal node_id.

get_outgoing_edges()¶

def get_outgoing_edges(self, node_id: str) -> list[dict]

Returns all outgoing edges for a node_id. Each dict contains at minimum edge_id, from_node_id, to_node_id, relationship, properties, created_at, updated_at.

get_neighbors()¶

def get_neighbors(
    self, entity_type: str, entity_value: str, relationship: str | None = None
) -> list[dict]

Returns adjacent nodes reachable via outgoing edges, with optional relationship filter. Each result contains node (the target node dict), relationship, and edge_properties.

traverse()¶

def traverse(self, start_type: str, start_value: str, max_depth: int = 2) -> list[dict]

Depth-first traversal up to max_depth. Returns a list of paths; each path is a list of step dicts with keys from_type, from_value, relationship, to_type, to_value.

Temporal indexing¶

add_temporal_edge()¶

def add_temporal_edge(
    self,
    from_type: str,
    from_value: str,
    to_type: str,
    to_value: str,
    relationship: str,    # TEMPORAL_BEFORE | TEMPORAL_AFTER | SUPERSEDES
    timestamp: str,
    properties: dict | None = None,
) -> str

Creates an edge and indexes it in _temporal_index (keyed by timestamp string) and _entity_timeline (keyed by "entity_type:entity_value").

Timestamp formats accepted by _parse_timestamp():

ISO 8601 (including Z suffix)
%Y-%m-%d
%Y-%m-%d %H:%M:%S
%d %b %Y (e.g. 15 Jan 2026)
%B %d, %Y (e.g. January 15, 2026)

get_entity_timeline()¶

def get_entity_timeline(self, entity_type: str, entity_value: str) -> list[dict]

Returns the chronological state history for an entity. Each entry: {"edge": dict, "timestamp": str, "to_entity": "type:value"}.

get_changes_since()¶

def get_changes_since(self, timestamp: str) -> list[dict]

Returns all temporal entity changes at or after timestamp. Results are sorted chronologically. Each entry: {"timestamp": str, "from": "type:value", "relationship": str, "to": "type:value"}.

get_latest_state()¶

def get_latest_state(self, entity_type: str, entity_value: str) -> dict | None

Returns the most recent temporal edge entry for an entity, or None if no temporal data exists.

Causal edge queries¶

Causal edges (edge_type: "causal") represent LLM-extracted cause-and-effect triples. Two dedicated methods traverse them.

get_causal_edges()¶

def get_causal_edges(
    self, entity_type: str, entity_value: str,
    max_depth: int = 3, max_visited: int = 50,
) -> list[dict]

BFS over outgoing causal edges — traces forward from cause to effects. Useful for "what does this actor or event lead to?" queries.

get_incoming_causal()¶

def get_incoming_causal(
    self, entity_type: str, entity_value: str,
    max_depth: int = 3, max_visited: int = 50,
) -> list[dict]

BFS over incoming causal edges — traces back to root causes. Useful for "why did this happen?" queries.

Global singleton¶

def get_knowledge_graph() -> KnowledgeGraph

Returns the process-global KnowledgeGraph instance. Checks ZETTELFORGE_BACKEND (default: sqlite):

If ZETTELFORGE_BACKEND=typedb and the TypeDB extension is installed, uses TypeDB. On ThreatRecall.ai SaaS, TypeDB is the active KG backend.
Otherwise: uses the JSONL KnowledgeGraph instance.

Note: when you use MemoryManager with the default SQLite backend, KG writes go directly to the SQLite tables via self.store.add_kg_edge(), not through get_knowledge_graph(). Call get_knowledge_graph() for direct graph access independent of the memory manager.

Examples¶

from zettelforge.knowledge_graph import KnowledgeGraph

kg = KnowledgeGraph()

# Add nodes and an edge
kg.add_node("actor", "apt28")
kg.add_node("tool", "cobalt strike")
edge_id = kg.add_edge("actor", "apt28", "tool", "cobalt strike", "USES_TOOL")

# Query neighbors
neighbors = kg.get_neighbors("actor", "apt28")
for n in neighbors:
    print(f"{n['relationship']} -> {n['node']['entity_type']}:{n['node']['entity_value']}")
# Output: USES_TOOL -> tool:cobalt strike

# Traverse up to depth 2
paths = kg.traverse("actor", "apt28", max_depth=2)
for path in paths:
    print(" -> ".join(
        f"{p['from_type']}:{p['from_value']} [{p['relationship']}] {p['to_type']}:{p['to_value']}"
        for p in path
    ))

from zettelforge.knowledge_graph import KnowledgeGraph

kg = KnowledgeGraph()

# Temporal edge: record that an event preceded another
kg.add_temporal_edge(
    from_type="actor", from_value="apt28",
    to_type="campaign", to_value="nato-phishing-2026-q1",
    relationship="TEMPORAL_BEFORE",
    timestamp="2026-01-15",
)

# What changed since the start of Q1?
changes = kg.get_changes_since("2026-01-01")
for c in changes:
    print(f"[{c['timestamp']}] {c['from']} {c['relationship']} {c['to']}")

# Causal chain: forward (cause → effects)
causes = kg.get_causal_edges("actor", "apt28", max_depth=3)

# Causal chain: backward (root cause analysis)
root_causes = kg.get_incoming_causal("campaign", "nato-phishing-2026-q1", max_depth=3)