Use detection rules and the explainer¶

Use this guide when you have ingested Sigma or YARA rules into ZettelForge and you want an LLM-generated, structured explanation of what each rule detects, how it works, and its false-positive patterns.

This covers ZettelForge 2.7.0. The explainer is the shipped detection-rule feature in v1. The match-consumer interface exists but no concrete consumers ship in v1 (see Limitations in v1).

Prerequisites¶

ZettelForge 2.7.0 or newer installed (pip install -U zettelforge)
A configured LLM provider. The explainer calls whatever provider zettelforge.llm_client.get_llm_provider() resolves to. The mock provider returns a deterministic placeholder and makes no network calls, which is useful for testing the wiring.
One or more detection rules in Sigma or YARA format

How the explainer fits¶

The explainer is a standalone function. It is not wired into ingest in v1: parsing or ingesting a Sigma/YARA rule does not automatically generate an explanation. Callers invoke explain() themselves after they have a parsed rule.

The pieces:

DetectionRule (zettelforge.detection.base) is the writeable supertype shared by all formats. SigmaRule and YaraRule are subtypes that share its field contract (rule_id, title, source_format, content_sha256, level, status, tags, and others).
explain(rule, *, rule_body, provider=None) (zettelforge.detection.explainer) sends the rule to the LLM and returns a RuleExplanation.
RuleExplanation is the structured result.

Explain a Sigma rule¶

The end-to-end path is: parse the Sigma YAML, validate it, turn it into a SigmaRule, then explain it.

from zettelforge.sigma import parse_yaml, validate, from_rule_dict
from zettelforge.detection import explain

rule_yaml = """
title: Suspicious PowerShell Download Cradle
id: 11111111-2222-3333-4444-555555555555
status: experimental
description: Detects PowerShell download cradle via Net.WebClient
author: Example
level: high
logsource:
  product: windows
  category: process_creation
detection:
  selection:
    CommandLine|contains: 'Net.WebClient'
  condition: selection
tags:
  - attack.execution
  - attack.t1059.001
"""

rule_dict = parse_yaml(rule_yaml)

result = validate(rule_dict)
if not result.valid:
    raise SystemExit(f"invalid Sigma rule: {result.errors}")

rule, _relations = from_rule_dict(rule_dict)
explanation = explain(rule, rule_body=rule_yaml)

print(explanation.summary)
print("confidence:", explanation.confidence)

validate() returns a ValidationResult with a valid boolean and an errors list. from_rule_dict() returns a (SigmaRule, relations) tuple; the relations describe knowledge-graph edges and are not needed for explanation.

Pass the raw rule text as rule_body. The explainer sends that body, not the parsed object, to the LLM.

What you get back¶

RuleExplanation has these fields:

Field	Meaning
`summary`	One-line description of what the rule detects
`mechanism`	How the rule matches (fields, strings, conditions)
`threat_model`	The behavior or threat the rule targets
`false_positive_patterns`	List of known benign triggers
`related_techniques`	List of related technique identifiers
`confidence`	Float clamped to `[0.0, 1.0]`
`model`	Provider name that produced the explanation
`generated_at`	ISO 8601 timestamp
`schema_version`	RuleExplanation schema version (`1.0`)

Failure handling¶

The explainer never raises for recoverable conditions. If the LLM is offline, returns invalid JSON, returns an empty response, or the call is rate-limited, explain() returns a RuleExplanation with confidence=0.0 and a diagnostic summary (for example explanation unavailable: rate limited or explanation unavailable: llm error (...)). Always check confidence before trusting an explanation:

explanation = explain(rule, rule_body=rule_yaml)
if explanation.confidence == 0.0:
    # degraded result — render the diagnostic summary, do not treat as authoritative
    print("explainer degraded:", explanation.summary)

With the mock provider, explain() returns a canned result: summary="mock provider — no real explanation", confidence=0.0, model="mock:mock". Use it to confirm your call site works without spending tokens.

Rate limiting and cost¶

The explainer enforces an in-process, per-minute cap: 60 calls per minute per process by default. Override it with the ZETTELFORGE_EXPLAIN_RPM environment variable:

export ZETTELFORGE_EXPLAIN_RPM=120

When bulk-explaining many rules, gate enqueue with zettelforge.detection.explainer.rate_limit_ok() before each call; it reports whether the next call fits under the cap without consuming a token. The explainer also re-checks the limit internally, so a caller that bypasses the gate still gets rate-limited rather than overspending.

Security model¶

Rule bodies are untrusted input. The explainer applies several controls before the body reaches the LLM:

The body is wrapped in an <rule_source untrusted="true"> delimiter and the system prompt instructs the model to treat everything inside as data, not instructions.
Any </rule_source> sequence in the body is neutralized so a crafted rule cannot close the delimiter and inject its own instructions.
The body is hard-capped at 8192 characters before the call, limiting prompt-injection blast radius and token cost. Longer bodies are truncated with a ... [truncated] marker.
The verbatim LLM output is never persisted on the returned RuleExplanation; only the coerced, typed fields are kept.

Limitations in v1¶

Match consumers do not ship in v1

ZettelForge defines a DetectionMatchConsumer protocol and a RuleMatchEvent type for adapting external rule-match events (from a SIEM or detection pipeline) into note writes. In v1 the interface is frozen but no concrete consumers are included — zettelforge.detection.ALL_CONSUMERS is empty by design. Consumer implementations are deferred to a future release. If you need to react to live rule matches today, you must write your own consumer against the protocol.

The explainer is not auto-invoked on ingest

Ingesting Sigma or YARA rules does not enqueue explanation jobs in v1. Call explain() yourself after ingest completes.

Detection rules schema — the DetectionRule field contract
Sigma schema reference — Sigma fields ZettelForge recognizes
LLM budgets and timeouts — how provider cost and timeout limits are enforced
Configure Sigma ingestion — getting rules into ZettelForge before you explain them