Sigma Schema Reference¶
Module: zettelforge.sigma.parser, zettelforge.sigma.schemas
from zettelforge.sigma.parser import (
parse_yaml, parse_file, validate,
SigmaParseError, SigmaValidationError, ValidationResult,
)
from zettelforge.sigma.ingest import ingest_rule, ingest_rules_dir
Overview¶
ZettelForge vendors three SigmaHQ JSON schemas for rule validation and dispatches
rules to the correct schema based on their top-level keys. The parser uses
jsonschema.Draft202012Validator for schema conformance checking.
SigmaHQ specification version: V2.0.0 (2024-08-08)
JSON Schema draft: 2020-12
Vendored Schemas¶
Three schema files live in src/zettelforge/sigma/schemas/:
| File | Schema Title | Used When |
|---|---|---|
sigma-detection-rule-schema.json |
Sigma rule specification V2.0.0 | No correlation: or filter: key present |
sigma-correlation-rules-schema.json |
Sigma correlation rules | correlation: key present in top-level dict |
sigma-filters-schema.json |
Sigma filters | filter: key present in top-level dict |
Schema Loading¶
Schemas are loaded lazily and cached in _SCHEMA_CACHE:
def _load_schema(name: str) -> dict[str, Any]:
"""Load a vendored schema JSON by filename (cached)."""
The importlib.resources API locates schema files within the zettelforge.sigma.schemas
package directory. Schemas are loaded as raw JSON (not YAML) and validated against
the Draft 2020-12 meta-schema.
Schema Dispatch Logic¶
The _pick_schema() function implements dispatch:
def _pick_schema(rule: dict[str, Any]) -> dict[str, Any]:
if isinstance(rule, dict):
if "correlation" in rule:
return _load_schema("sigma-correlation-rules-schema.json")
if "filter" in rule:
return _load_schema("sigma-filters-schema.json")
return _load_schema("sigma-detection-rule-schema.json")
Dispatch order:
1. correlation key present -> correlation schema
2. filter key present -> filters schema
3. Otherwise -> detection-rule schema
Detection Rule Schema (V2.0.0)¶
Required top-level keys: title, logsource, detection
Top-Level Properties¶
| Property | Type | Required | Description |
|---|---|---|---|
title |
string (max 256) |
yes | Brief title describing what the rule detects |
id |
string (UUID format) |
no | Globally unique identifier (UUID v4 recommended) |
related |
array of objects |
no | Related rule references with id and type |
name |
string (max 256) |
no | Human-readable name for correlation rule references |
taxonomy |
string (max 256) |
no | Taxonomy used in the rule |
status |
string (enum) |
no | One of stable, test, experimental, deprecated, unsupported |
description |
string |
no | Detailed rule description |
references |
array of strings |
no | External references and URLs |
author |
string |
no | Rule author |
date |
string (date) |
no | Creation date (ISO 8601) |
modified |
string (date) |
no | Last modification date |
tags |
array of strings |
no | Rule tags (MITRE ATT&CK, CVE, etc.) |
level |
string (enum) |
no | One of informational, low, medium, high, critical |
logsource |
object |
yes | Log source definition with product, service, category |
detection |
object |
yes | Detection logic with selections and condition |
falsepositives |
array of strings |
no | Known false-positive scenarios |
fields |
array of strings |
no | Log field names required for correlation |
license |
string |
no | Rule license identifier |
logsource Object¶
| Property | Type | Description |
|---|---|---|
product |
string |
Log source product (e.g. windows, linux) |
service |
string |
Log source service (e.g. security, sysmon) |
category |
string |
Log source category (e.g. process_creation, file_event) |
definition |
string |
Log source definition text |
detection Object¶
The detection block must contain at minimum a condition string and one or
more named selection groups. Each selection is a mapping of field names to
values or lists of values supporting Sigma's modifiers (|contains, |endswith,
|startswith, |re, etc.).
status Enum Values¶
| Value | Description |
|---|---|
stable |
No false positives in multiple environments over long period |
test |
No obvious false positives on limited test systems |
experimental |
New rule, potentially many false positives |
deprecated |
Replaced by another rule or no longer relevant |
unsupported |
Cannot be supported due to product changes |
level Enum Values¶
| Value | Description |
|---|---|
informational |
Not an attack, but of security interest |
low |
Low severity |
medium |
Medium severity |
high |
High severity |
critical |
Critical severity |
related[].type Enum Values¶
| Value | Description |
|---|---|
derived |
Rule derived from referred rule (rule may still be active) |
obsolete |
This rule obsoletes the referred rule |
merged |
Rule merged from referred rules |
renamed |
Rule was renamed from previous identifier |
similar |
Related rules with similar detection content |
Correlation Rules Schema¶
The correlation schema adds the correlation object and removes the
detection requirement. A correlation rule defines relationships between
multiple detection rules.
Key properties (in addition to detection rule common fields):
| Property | Type | Description |
|---|---|---|
correlation |
object |
Correlation definition with type, rule, group-by, timespan |
correlation.type |
string |
One of event_count, value_count, temporal, ordered, sequence |
correlation.rule |
string or array |
Referenced rule names or IDs |
Filters Schema¶
The filters schema adds the filter key. Filter rules define event filtering
conditions applied before detection.
Key properties (in addition to detection rule common fields):
| Property | Type | Description |
|---|---|---|
filter |
object |
Filter definition with selections and condition |
Parser API¶
parse_yaml()¶
Parses Sigma YAML text into a validated dict.
- Calls
yaml.safe_load(text)to parse YAML - Coerces YAML-parsed
date/datetimeobjects back to ISO-8601 strings (PyYAML's default resolver converts ISO date strings todatetime.date) - Runs
validate()against the appropriate vendored schema - Raises
SigmaParseErroron YAML errors orSigmaValidationErroron schema violations
Date coercion: The _stringify_dates() function recursively walks the
parsed dict and converts any datetime.date or datetime.datetime instances
to ISO-8601 string format. This prevents false schema validation failures
caused by PyYAML's auto-conversion of date-like strings.
parse_file()¶
Parses a Sigma rule file into a validated dict.
- File size limit:
MAX_RULE_FILE_BYTES = 1_048_576(1 MB) - Raises
SigmaParseErrorif the file is oversized ( > 1 MB) or unreadable - Wraps any
SigmaParseError/SigmaValidationErrorwith the file path prefix
validate()¶
Validates a parsed Sigma rule dict against the appropriate schema. Returns a
ValidationResult listing each violation with a dotted path to the offending
field rather than opaque jsonschema internals.
Error Types¶
SigmaParseError¶
Raised when a Sigma rule cannot be parsed (bad YAML, I/O error, oversized file).
SigmaValidationError¶
Raised when a Sigma rule fails JSON-schema validation against the vendored schema.
ValidationResult¶
@dataclass
class ValidationResult:
valid: bool
errors: list[str] = field(default_factory=list)
def __bool__(self) -> bool:
return self.valid
errors contains human-readable messages with dotted field paths:
- detection.condition: 'selections' is a required property
- logsource: 'product' is a required property
- <root>: 'title' is a required property
Ingestion Pipeline¶
The ingest_rule() function in sigma.ingest orchestrates the full pipeline:
- Parse & validate YAML against the vendored SigmaHQ schema via
parse_file()orparse_yaml() - Build entity via
from_rule_dict()which returns aSigmaRuleinstance and a list of relation dicts - Construct note content (YAML body + summary line with title, level, status, logsource)
- Persist note via
MemoryManager.remember()withsource_type="sigma_rule" - Write KG edges via
store.add_kg_edge()for each relation
Idempotency¶
Source ref pattern: sigma:<rule_id>:<content_sha256[:12]>
A store lookup precedes every write. Unchanged rules return the original note.
Security Guards¶
- File size capped at 1 MB (
MAX_RULE_FILE_BYTES) - Symlinks are never followed during directory walk (
ingest_rules_dir) - Paths that resolve outside the rules root are skipped
Edge Tagging¶
All KG edges emitted during ingest carry:
- edge_type: detection — distinguishes from causal or heuristic edges
- source: sigma_ingest — provenance tag for downstream filtering
File Size Limit¶
Sigma rules are typically a few KB. The 1 MB ceiling catches runaway payloads without blocking normal multi-rule files.
Minimal Example¶
from zettelforge.sigma import parse_yaml, validate
yaml_text = \"\"\"
title: Suspicious Whoami Execution
logsource:
category: process_creation
product: windows
detection:
selection:
Image|endswith: '\\\\whoami.exe'
condition: selection
\"\"\"
rule = parse_yaml(yaml_text)
result = validate(rule)
print(f"Valid: {bool(result)}")
Full Example¶
from zettelforge.sigma import parse_file, from_rule_dict, SigmaValidationError
try:
rule = parse_file("/path/to/sigma/rules/proc_creation_win_whoami.yml")
entity, relations = from_rule_dict(rule)
print(f"Rule: {entity.title}")
print(f"Type: {entity.rule_type}")
print(f"Logsource: product={entity.logsource_product}, category={entity.logsource_category}")
print(f"Relations: {len(relations)}")
for rel in relations:
print(f" {rel['rel']} -> {rel['to_type']}:{rel['to_value']}")
except SigmaValidationError as e:
print(f"Validation failed: {e}")
Dataclass Round-Trip¶
The parsed rule dict can be round-tripped through from_rule_dict() and
ingest_rule() (which accepts a pre-parsed dict as input):