Configure Sigma Rule Ingestion¶
Ingest Sigma detection rules (SigmaHQ format) into ZettelForge memory. Each rule is parsed, validated against the vendored SigmaHQ JSON schema, mapped to a SigmaRule entity with typed knowledge graph relations, and persisted as a memory note.
Prerequisites¶
- ZettelForge installed (
pip install zettelforge) - Sigma rule files in
.ymlor.yamlformat (SigmaHQ specification V2.0.0) - Embedding and LLM models available (download automatically on first use)
Steps¶
1. Use the CLI (quick start)¶
Dry-run a directory to validate rules before ingesting:
Output shows each parsed rule with its id, rule type, tag count, and relation count:
OK /path/to/sigma/rules/proc_creation_win_whoami.yml id=sigma_a1b2c3d4e5f67890 type=detection tags=3 edges=6
OK /path/to/sigma/rules/susp_ps_execution.yml id=55043c5f-3c72-4fb6-aa22-70b6f7e98d4a type=detection tags=5 edges=9
Dry-run summary: 2/2 parsed, 0 failed.
Live ingestion into a MemoryManager:
2. Use the Python API¶
from zettelforge import MemoryManager
from zettelforge.sigma import ingest_rule, ingest_rules_dir
mm = MemoryManager()
# Ingest a single file
note, relations = ingest_rule(
"/path/to/rule.yml",
mm,
domain="detection",
)
print(f"Ingested: {note.id}, relations: {len(relations)}")
# Ingest a directory (walks recursively)
ingested, skipped = ingest_rules_dir(
"/path/to/sigma/rules/",
mm,
glob="**/*.yml",
domain="detection",
)
print(f"Ingested: {ingested}, skipped: {skipped}")
3. Parse without persisting¶
For validation pipelines or custom workflows, parse rules without memory storage:
from zettelforge.sigma import parse_file, parse_yaml, from_rule_dict
# Parse from file
rule_dict = parse_file("rule.yml")
# Parse from YAML string
yaml_text = """
title: Suspicious Whoami Execution
logsource:
category: process_creation
product: windows
detection:
selection:
Image|endswith: '\\whoami.exe'
condition: selection
"""
rule_dict = parse_yaml(yaml_text)
# Map to entity and KG relations
entity, relations = from_rule_dict(rule_dict)
print(f"Rule: {entity.title}")
print(f"Logsource: product={entity.logsource_product}, category={entity.logsource_category}")
print(f"Relation types: {set(r['rel'] for r in relations)}")
4. Accept multiple input types¶
ingest_rule() accepts a parsed dict, raw YAML string, or Path:
# Dict (pre-parsed)
note, rels = ingest_rule(rule_dict, mm)
# YAML string
note, rels = ingest_rule(yaml_text, mm)
# Path object
from pathlib import Path
note, rels = ingest_rule(Path("rule.yml"), mm)
# File path as string (auto-detected if it looks like a path)
note, rels = ingest_rule("rule.yml", mm)
5. Validate against the Sigma schema¶
Run standalone validation without entity mapping:
from zettelforge.sigma import validate, parse_yaml, SigmaValidationError
rule = parse_yaml(yaml_text)
result = validate(rule)
if not result.valid:
for error in result.errors:
print(f" Validation error: {error}")
6. Understand idempotency¶
Re-ingesting an unchanged rule returns the original note. The source_ref follows the pattern sigma:<rule_id>:<content_sha256_prefix> and is checked before any write:
first, _ = ingest_rule("rule.yml", mm)
second, _ = ingest_rule("rule.yml", mm)
assert first.id == second.id # same note, no duplicate
7. CLI flags reference¶
usage: python -m zettelforge.sigma.ingest [-h] [--domain DOMAIN] [--dry-run] [--glob GLOB] path
positional arguments:
path Sigma rule file or directory
options:
--domain DOMAIN Memory domain for ingested notes (default: detection)
--dry-run Parse + validate + map without persisting to memory
--glob GLOB Glob used when path is a directory (default: **/*.yml)
LLM Quick Reference¶
Task: Ingest Sigma rules into ZettelForge memory with automatic knowledge graph population.
Primary CLI: python -m zettelforge.sigma.ingest <path> with optional --dry-run, --domain, --glob.
Primary Python API: ingest_rule(source, mm, domain="detection") returns (MemoryNote, relations_list). Accepts dict, string, or Path. ingest_rules_dir(path, mm) walks a directory tree and returns (ingested_count, skipped_count).
Pipeline: Input -> parse_file() or parse_yaml() (YAML load + JSON-schema validation against vendored SigmaHQ schemas) -> from_rule_dict() (map to SigmaRule entity + relations) -> mm.remember() (persist as memory note) -> KG edge persistence.
Validation: validate(rule_dict) returns ValidationResult(valid, errors). Two error types: SigmaParseError (bad YAML or I/O) and SigmaValidationError (schema violation). Both bubble through ingest_rule().
Schema dispatch: Detection rules validate against sigma-detection-rule-schema.json. Rules with a correlation: key use the correlation schema. Rules with a filter: key use the filters schema.
Tag resolution: Sigma tags (attack.t1059, cve.2021-44228) upgrade to typed KG edges (detects -> AttackPattern, references_cve -> Vulnerability, attributed_to -> IntrusionSet/Malware). Raw tagged_with -> SigmaTag edges are always preserved alongside upgrade edges. tlp.* and detection.* tags are metadata-only (no upgrade).
Logsource edges: Every populated logsource facet (product, service, category) generates an applies_to -> LogSource edge with the facet type and value in properties.
Related rule edges: The related: block maps to superseded_by (type: obsolete) or related_to (all other types).
Idempotency: Source ref pattern sigma:<rule_id>:<content_sha256[:12]>. A store lookup precedes every write. Unchanged rules return the original note.
Security: File size capped at 1 MB. Symlinks are never followed during directory walk. Paths that resolve outside the rules root are skipped.
Edge tagging: All KG edges emitted during ingest carry edge_type: detection and source: sigma_ingest properties for downstream filtering.