Sigma schema reference¶

Modules: zettelforge.sigma, zettelforge.sigma.parser, zettelforge.sigma.entities, zettelforge.sigma.tags, zettelforge.sigma.ingest

ZettelForge 2.7.0 — Apache-2.0 license

from zettelforge.sigma import (
    SigmaRule, SigmaParseError, SigmaValidationError,
    parse_yaml, parse_file, validate,
    from_rule_dict, rule_to_entities, resolve_sigma_tag,
    ingest_rule, ingest_rules_dir,
)

What this reference covers¶

The Sigma subsystem has three layers:

Layer	Module	Responsibility
Schema	`sigma.parser` / `sigma.schemas`	YAML parsing + JSON-schema validation
Entity	`sigma.entities` / `sigma.tags`	Map a validated rule dict to `SigmaRule` + KG edges
Ingest	`sigma.ingest`	Orchestrate parse → entity → remember → persist

This page covers all three layers. The CLI wraps the ingest layer; see CLI.

Vendored schemas¶

Three SigmaHQ JSON schemas live in src/zettelforge/sigma/schemas/. Schema selection is automatic based on the top-level keys of the parsed rule dict.

File	Schema title	When used
`sigma-detection-rule-schema.json`	Sigma rule specification V2.0.0	Default; no `correlation` or `filter` key
`sigma-correlation-rules-schema.json`	Sigma correlation rules	`correlation` key present
`sigma-filters-schema.json`	Sigma filters	`filter` key present

SigmaHQ specification version: V2.0.0 (2024-08-08). JSON Schema draft: 2020-12.

Schemas load lazily and cache in _SCHEMA_CACHE. The importlib.resources API locates them inside the zettelforge.sigma.schemas package so they install with the package and require no separate download.

Schema dispatch¶

# from sigma/parser.py
def _pick_schema(rule: dict[str, Any]) -> dict[str, Any]:
    if isinstance(rule, dict):
        if "correlation" in rule:
            return _load_schema("sigma-correlation-rules-schema.json")
        if "filter" in rule:
            return _load_schema("sigma-filters-schema.json")
    return _load_schema("sigma-detection-rule-schema.json")

Dispatch is ordered: correlation check before filter check, filter before detection. A rule with both correlation and filter is dispatched to the correlation schema.

Detection rule schema¶

Required top-level keys: title, logsource, detection.

Top-level properties¶

Property	Type	Required	Description
`title`	string (max 256)	yes	Brief description of what the rule detects
`id`	string (UUID)	no	Globally unique identifier; UUID v4 recommended
`name`	string (max 256)	no	Human-readable name for correlation rule references
`related`	array of objects	no	Cross-references to other rules
`taxonomy`	string (max 256)	no	Taxonomy identifier used in the rule
`status`	string (enum)	no	Rule lifecycle state (see below)
`description`	string	no	Detailed rule description
`references`	array of strings	no	External URLs and references
`author`	string	no	Rule author
`date`	string (ISO 8601)	no	Creation date
`modified`	string (ISO 8601)	no	Last modification date
`tags`	array of strings	no	Namespaced tags (ATT&CK, CVE, TLP, etc.)
`level`	string (enum)	no	Severity level (see below)
`logsource`	object	yes	Log source selector
`detection`	object	yes	Detection logic with selections and condition
`falsepositives`	array of strings	no	Known false-positive scenarios
`fields`	array of strings	no	Log field names to include in output
`license`	string	no	Rule license identifier (e.g. `DRL 1.1`)

`status` enum¶

Value	Meaning
`stable`	No obvious false positives in multiple environments over a long period
`test`	No obvious false positives on a limited set of test systems
`experimental`	Not tested outside lab environments; could lead to many false positives
`deprecated`	Replaced by or covered by another rule; linked via the `related` field
`unsupported`	Cannot be used in its current state (special log, home-made fields, etc.)

`level` enum¶

Value	Meaning
`informational`	Not an attack, but of security interest
`low`	Low severity
`medium`	Medium severity
`high`	High severity
`critical`	Critical severity

`logsource` object¶

Property	Type	Description
`product`	string	Log source product (e.g. `windows`, `linux`, `aws`)
`service`	string	Log source service (e.g. `security`, `sysmon`)
`category`	string	Log source category (e.g. `process_creation`, `file_event`)
`definition`	string	Free-text definition for custom log sources

`related[].type` enum¶

Value	How ZettelForge maps it
`derived`	`related_to` KG edge
`obsolete`	`superseded_by` KG edge
`merged`	`related_to` KG edge
`renamed`	`related_to` KG edge
`similar`	`related_to` KG edge

Only obsolete produces a superseded_by edge; all other related[].type values produce related_to.

Correlation and filter schemas¶

Both schemas share the detection rule common fields and add format-specific keys.

Correlation rule (correlation key present):

Property	Type	Description
`correlation.type`	string	One of `event_count`, `value_count`, `temporal`, `ordered`, `sequence`
`correlation.rule`	string or array	Referenced rule names or IDs
`correlation.group-by`	array	Fields to group correlated events by
`correlation.timespan`	string	Time window (e.g. `15m`, `1h`)

Filter rule (filter key present):

Property	Type	Description
`filter`	object	Filter definition with selections and condition

Parser API¶

`parse_yaml()`¶

def parse_yaml(text: str) -> dict[str, Any]

Parses Sigma YAML text into a validated dict.

Steps: 1. yaml.safe_load(text) — raises SigmaParseError on bad YAML 2. _stringify_dates() — recursively coerces datetime.date and datetime.datetime back to ISO-8601 strings (PyYAML auto-converts date strings; this reversal prevents false schema violations) 3. validate() — dispatches to the appropriate vendored schema; raises SigmaValidationError on failure

`parse_file()`¶

def parse_file(path: str | Path) -> dict[str, Any]

Reads and parses a Sigma rule file. Adds the file path to any exception message for easier debugging.

File size limit: MAX_RULE_FILE_BYTES = 1_048_576 (1 MB)
Raises SigmaParseError if the file exceeds 1 MB, is unreadable, or contains bad YAML
Raises SigmaValidationError if the rule fails schema validation

`validate()`¶

def validate(rule: dict[str, Any]) -> ValidationResult

Validates a pre-parsed rule dict against the appropriate schema. Returns a ValidationResult with human-readable error messages that include dotted field paths (e.g. detection.condition: 'condition' is a required property).

Error types¶

`SigmaParseError`¶

class SigmaParseError(ValueError)

Raised by: bad YAML, I/O error, oversized file.

`SigmaValidationError`¶

class SigmaValidationError(ValueError)

Raised by: rule fails JSON-schema validation.

`ValidationResult`¶

@dataclass
class ValidationResult:
    valid: bool
    errors: list[str] = field(default_factory=list)

    def __bool__(self) -> bool:
        return self.valid

errors contains one entry per violation, with a dotted path to the offending field:

<root>: 'title' is a required property
detection.condition: 'selections' is a required property
logsource: 'product' is a required property

`DetectionRule` base class¶

SigmaRule inherits from DetectionRule (zettelforge.detection.base). Both Sigma and YARA rules share these base fields:

Field	Type	Description
`rule_id`	`str`	Unique identifier (upstream UUID or content-hash prefix)
`title`	`str`	Rule title
`source_format`	`str`	`"sigma"` for Sigma rules
`content_sha256`	`str`	SHA-256 of the canonical YAML form (stable dedupe key)
`description`	`str \\| None`	Rule description
`author`	`str \\| None`	Rule author
`date`	`str \\| None`	Creation date (ISO-8601)
`modified`	`str \\| None`	Last modification date (ISO-8601)
`references`	`list[str]`	External references
`tags`	`list[str]`	Raw Sigma tags
`level`	`str \\| None`	`informational` \| `low` \| `medium` \| `high` \| `critical`
`status`	`str \\| None`	`stable` \| `test` \| `experimental` \| `deprecated` \| `unsupported`
`tlp`	`str \\| None`	TLP marking
`license`	`str \\| None`	Rule license
`source_repo`	`str \\| None`	Upstream repository
`source_path`	`str \\| None`	Path within source repo
`extra`	`dict[str, Any]`	Extension fields

`SigmaRule` dataclass¶

SigmaRule extends DetectionRule with Sigma-specific fields:

@dataclass
class SigmaRule(DetectionRule):
    logsource_product: str | None = None
    logsource_service: str | None = None
    logsource_category: str | None = None
    rule_level: str | None = None      # raw Sigma level before enum mapping
    rule_status: str | None = None     # raw Sigma status before enum mapping
    sigma_format_version: str | None = None
    detection_body: str | None = None  # YAML-serialized detection/correlation block
    rule_type: str = "detection"       # "detection" | "correlation" | "filter"
    fields: list[str] = field(default_factory=list)
    falsepositives: list[str] = field(default_factory=list)

rule_type is inferred automatically: "correlation" if the rule dict has a correlation key, "filter" if it has a filter key, otherwise "detection".

rule_id falls back to "sigma_" + content_sha256[:16] when the rule has no id field. This makes the ID deterministic and stable for re-ingest deduplication.

`from_rule_dict()` / `rule_to_entities()`¶

def from_rule_dict(
    rule_dict: dict[str, Any]
) -> tuple[SigmaRule, list[dict[str, Any]]]

rule_to_entities is an alias for from_rule_dict (same object). Both names are exported from zettelforge.sigma.

Converts a validated rule dict into: - A SigmaRule instance with all fields populated - A list of relation dicts describing edges into the knowledge graph

Relation dict shape¶

{
    "from_type": "SigmaRule",
    "from_value": "<rule_id>",
    "rel":        "<relation_type>",
    "to_type":    "<entity_type>",
    "to_value":   "<entity_value>",
    "properties": {...},   # optional
}

Relation types emitted¶

Relation	`to_type`	When emitted
`applies_to`	`LogSource`	One edge per non-null `logsource` facet (`product`, `service`, `category`). `to_value` format: `facet:value` (e.g. `product:windows`)
`tagged_with`	`SigmaTag`	Every tag in `tags[]`, always. Preserves raw provenance. `to_value` is the raw tag string
`detects`	`AttackPattern`	`attack.t*` tags (technique and sub-technique). `to_value` uppercased: `T1059`, `T1059.001`
`references_cve`	`Vulnerability`	`cve.*` tags. `to_value` normalized to `CVE-YYYY-NNNN`
`attributed_to`	`IntrusionSet`	`attack.g*` tags (ATT&CK group IDs). `to_value` uppercased: `G0007`
`attributed_to`	`Malware`	`attack.s*` tags (ATT&CK software IDs). `to_value` uppercased: `S0154`
`superseded_by`	`SigmaRule`	`related[].type == "obsolete"`
`related_to`	`SigmaRule`	`related[].type` any other value

Tags in the tlp.* and detection.* namespaces, and ATT&CK tactic names (e.g. attack.initial-access), emit only a tagged_with edge and are not upgraded to typed entities.

`resolve_sigma_tag()`¶

def resolve_sigma_tag(tag: str) -> tuple[str, str] | None

Resolves a single Sigma tag string to (entity_type, entity_value) for typed KG cross-references, or None for metadata-only tags.

Input pattern	Returns
`attack.t1059`	`('AttackPattern', 'T1059')`
`attack.t1059.001`	`('AttackPattern', 'T1059.001')`
`attack.g0007`	`('IntrusionSet', 'G0007')`
`attack.s0154`	`('Malware', 'S0154')`
`cve.2021-44228`	`('Vulnerability', 'CVE-2021-44228')`
`cve.2024.3094`	`('Vulnerability', 'CVE-2024-3094')`
`tlp.amber`	`None` (metadata-only)
`detection.emerging`	`None` (metadata-only)
`attack.initial-access`	`None` (tactic name, not a typed entity)

CVE normalization accepts both Sigma's dot separator (cve.2021.44228) and hyphen (cve.2021-44228). Both produce the canonical CVE-YYYY-NNNN format.

Ingest API¶

`ingest_rule()`¶

def ingest_rule(
    rule: dict | str | Path,
    mm: MemoryManager,
    *,
    domain: str = "detection",
    source_ref: str | None = None,
    sync: bool = True,
) -> tuple[MemoryNote, list[dict[str, Any]]]

Orchestrates the full pipeline for a single rule.

rule accepts three input types:

Type	Behavior
`dict`	Already-parsed rule dict; skips YAML parsing
`str`	YAML text, or a file path string if no newlines and the path exists
`Path`	File path; calls `parse_file()`

Pipeline steps: 1. Coerce input to (rule_dict, default_source_ref) 2. from_rule_dict(rule_dict) → (SigmaRule entity, relations) 3. Compute source_ref = source_ref or f"sigma:{entity.rule_id}:{entity.content_sha256[:12]}" 4. Idempotency check: store.get_note_by_source_ref(source_ref) — returns existing note if found 5. Build note content: full YAML body + one-line summary ([sigma] <title> level=<level> status=<status> logsource=[...]) 6. mm.remember(content, source_type="sigma_rule", source_ref=source_ref, domain=domain, sync=sync) 7. store.add_kg_edge(...) for each relation

Returns (note, relations). On an idempotent re-ingest of an unchanged rule, returns the existing note and the freshly computed (but not re-persisted) relations.

Raises: - ValueError if mm is None - SigmaParseError / SigmaValidationError from the parse layer

`ingest_rules_dir()`¶

def ingest_rules_dir(
    path: str | Path,
    mm: MemoryManager,
    *,
    glob: str = "**/*.yml",
    domain: str = "detection",
    bulk: bool = False,
    flush_timeout: float | None = None,
) -> tuple[int, int]

Walks a directory and ingests every matching Sigma rule.

Returns (ingested, skipped). Parse and validation errors are logged as warnings and increment skipped; they do not abort the walk.

Security controls during directory walk: - Symlinks are skipped; a warning is logged - Paths that resolve outside the root directory are skipped

When bulk=True, sync=False is passed to each ingest_rule call and mm.flush(timeout=flush_timeout) is called once at the end. Use bulk mode for large imports.

Idempotency¶

Source ref format: sigma:<rule_id>:<content_sha256[:12]>

Re-ingesting an unchanged rule returns the original note without creating a duplicate. Re-ingesting a modified rule (different content hash) creates a new note. This matches the YARA ingest idempotency pattern.

Security controls¶

Control	Details
File size cap	`MAX_RULE_FILE_BYTES = 1_048_576` (1 MB). Raises `SigmaParseError` before reading the file.
YAML parsing	`yaml.safe_load` only. No arbitrary Python object deserialization.
Symlink traversal	Blocked in `ingest_rules_dir`. Symlinks are skipped with a warning.
Path traversal	Files that resolve outside the rules root are skipped with a warning.

KG edge metadata¶

Every edge written by ingest_rule carries two additional properties:

Property	Value	Purpose
`edge_type`	`"detection"`	Distinguish from causal (`"causal"`) or heuristic (`"heuristic"`) edges
`source`	`"sigma_ingest"`	Provenance tag for downstream graph queries

CLI¶

python -m zettelforge.sigma.ingest [--dry-run] [--domain DOMAIN] [--glob GLOB] <path>

path can be a single .yml / .yaml file or a directory.

Flag	Default	Description
`--dry-run`	off	Parse + validate + map entities without writing to memory or KG
`--domain`	`detection`	Memory domain for ingested notes
`--glob`	`*/.yml`	Glob pattern for directory walks; also matches `.yaml` automatically

Dry-run output per rule:

OK  rules/proc_creation_win_whoami.yml  id=7e3d88a2-...  type=detection  tags=3  edges=5
FAIL rules/bad.yml  rules/bad.yml: invalid YAML: ...

Dry-run summary: 12/13 parsed, 1 failed.

Note

The LLM rule explainer (detection/explainer.py) is not invoked by the CLI in v1. It runs asynchronously from the enrichment worker in v1.1+. The ZETTELFORGE_EXPLAIN_RPM env var (default 60) caps the explainer's call rate when wired up.

Quick reference¶

from zettelforge.sigma import (
    parse_yaml, parse_file, validate,
    from_rule_dict, ingest_rule,
    SigmaParseError, SigmaValidationError,
)

# Parse + validate only
rule_dict = parse_yaml(yaml_text)
result = validate(rule_dict)
if not result:
    for err in result.errors:
        print(err)

# Parse + build entity + inspect relations
rule_dict = parse_file("/path/to/rule.yml")
entity, relations = from_rule_dict(rule_dict)
print(entity.rule_id, entity.rule_type, entity.rule_level)
for r in relations:
    print(f"  {r['rel']} -> {r['to_type']}:{r['to_value']}")

# Full ingest (parse + persist + KG edges)
note, relations = ingest_rule("/path/to/rule.yml", mm)

Sigma schema reference¶

What this reference covers¶

Vendored schemas¶

Schema dispatch¶

Detection rule schema¶

Top-level properties¶

status enum¶

level enum¶

logsource object¶

related[].type enum¶