RFC-002: Universal LLM Provider Interface¶

Field	Value
Author	Patrick Roland
Status	Accepted — partially implemented
Created	2026-04-16
Last updated	2026-07-08
ZettelForge version	v2.7.0
Related RFCs	RFC-001 Conversational Entity Extractor (depends on `generate()` stability); RFC-012 LiteLLM Unified Provider (supersedes Phases 2–3)

Context¶

ZettelForge started with two hardcoded LLM backends: local (llama-cpp-python in-process GGUF) and ollama (HTTP to localhost:11434). Both serve offline use well, but they create friction for three groups:

Solo analysts who have OpenAI or Anthropic API keys and want better synthesis quality than a 3B model provides.
Teams running ZettelForge on cloud infrastructure where LLM endpoints are already available.
Contributors who want to add a new backend without touching extraction and synthesis code.

The original llm_client.py was 151 lines with hardcoded conditional logic. Every new provider added another branch, more config parsing, and harder tests.

Proposal¶

Introduce a pluggable provider interface behind the existing generate() function. The public API stays identical. Internally, generate() resolves the configured provider name to an LLMProvider instance through a thread-safe registry and delegates.

generate()                          # Public API — unchanged
  |
  v
ProviderRegistry.get(name)          # Returns configured LLMProvider instance
  |
  v
LLMProvider.generate(...)           # Protocol method — each provider implements this
  |
  +--> LocalProvider                # llama-cpp-python (built-in, default)
  +--> OllamaProvider               # Ollama HTTP API (built-in)
  +--> LiteLLMProvider              # 100+ cloud providers via LiteLLM (optional extra)
  +--> MockProvider                 # For testing (built-in)

The LLMProvider interface uses typing.Protocol (PEP 544) rather than an abstract base class. No forced inheritance is required — third-party providers need only duck-type compatibility. runtime_checkable enables isinstance() validation when callers want it.

What shipped¶

Phase 1 — Provider infrastructure (v2.3.0)¶

Phase 1 shipped in commit f67db7d, merged and released in v2.3.0.

The llm_providers/ package provides the complete extension surface:

File	Purpose
`base.py`	`LLMProvider` protocol; `LLMProviderConfigurationError`
`registry.py`	Thread-safe singleton registry (`register`, `get`, `reset`, `available`)
`local_provider.py`	llama-cpp-python in-process GGUF
`ollama_provider.py`	Ollama HTTP API
`mock_provider.py`	Deterministic test provider; records all calls
`__init__.py`	Registers built-ins; discovers third-party providers via entry points

You can list registered providers at runtime:

from zettelforge.llm_providers import available
print(available())
# ['litellm', 'local', 'mock', 'ollama']  — litellm appears when installed

The LLMProvider protocol:

from typing import Optional, Protocol, runtime_checkable

@runtime_checkable
class LLMProvider(Protocol):
    name: str  # e.g., "local", "ollama", "litellm"

    def generate(
        self,
        prompt: str,
        max_tokens: int = 400,
        temperature: float = 0.1,
        system: Optional[str] = None,
        json_mode: bool = False,
    ) -> str:
        """Generate text from a prompt. Returns empty string on failure."""
        ...

generate() callers are unchanged. All seven call sites (fact_extractor.py, memory_updater.py, synthesis_generator.py, intent_classifier.py, note_constructor.py, entity_indexer.py, memory_evolver.py) route through the registry automatically.

Config changes (Phase 1)¶

LLMConfig in config.py expanded from the original two fields to:

Field	Default	Notes
`provider`	`"ollama"`	Built-in providers: `local`, `ollama`, `mock`, `litellm`
`model`	`"qwen3.5:9b"`	Meaning depends on provider
`url`	`""`	Provider base URL (Ollama, OpenAI-compat)
`api_key`	`""`	Supports `${ENV_VAR}` references — never commit raw keys
`temperature`	`0.1`	Generation temperature
`timeout`	`180.0`	Seconds; bumped from 60 in v2.5.2 for reasoning models
`max_retries`	`2`	Retries on transient failure
`fallback`	`""`	Empty preserves implicit `local → ollama` fallback
`reasoning_model`	`False`	Enables scaling floors for reasoning-optimized models

Current docs note: qwen3.5:9b is the verified v2.7.0 source default in config.py, not a verified working Ollama tag. Set llm.model or ZETTELFORGE_LLM_MODEL to a model your provider can load. The Ollama provider's own fallback default is qwen2.5:3b only when no model is passed to it.

api_key supports ${ENV_VAR} syntax. Never put a raw key in a file you might commit to version control:

llm:
  provider: litellm
  model: gpt-4o-mini
  api_key: ${OPENAI_API_KEY}

LLMConfig.__repr__ redacts api_key so it never appears in logs or debug output.

Env var overrides¶

All overrides are applied at startup as the highest-priority source:

Env var	Config field
`ZETTELFORGE_LLM_PROVIDER`	`llm.provider`
`ZETTELFORGE_LLM_MODEL`	`llm.model`
`ZETTELFORGE_LLM_API_KEY`	`llm.api_key`
`ZETTELFORGE_LLM_TIMEOUT`	`llm.timeout`
`ZETTELFORGE_LLM_MAX_RETRIES`	`llm.max_retries`
`ZETTELFORGE_LLM_FALLBACK`	`llm.fallback`
`ZETTELFORGE_OLLAMA_MODEL`	Deprecated alias for `ZETTELFORGE_LLM_MODEL`

ZETTELFORGE_OLLAMA_MODEL is deprecated. Use ZETTELFORGE_LLM_MODEL. The alias remains functional but emits a deprecation warning. It is scheduled for removal in v3.0.0.

Third-party provider extension points¶

Third-party packages register providers via Python entry points — no ZettelForge source changes required:

# In a third-party package's pyproject.toml
[project.entry-points."zettelforge.llm_providers"]
my_provider = "my_package.provider:MyProvider"

ZettelForge discovers and loads these at import time via importlib.metadata. Load failures are logged at DEBUG level so a broken plugin never blocks startup.

Fallback behavior¶

The implicit local → ollama fallback from before Phase 1 is preserved. Fallback is now explicit and configurable:

# In generate():
# 1. Try primary provider.
# 2. ConfigurationErrors (bad key, missing SDK) propagate immediately — not retryable.
# 3. Transient errors try the fallback if configured.
# 4. If no fallback, local → ollama implicit fallback applies.
# 5. Total failure returns "".

Cloud provider coverage — RFC-012 (supersedes Phases 2–3)¶

RFC-002 Phases 2–3 proposed a custom openai_compat provider (httpx-based) and a native AnthropicProvider. These were not implemented as described. Instead, RFC-012 delivered a LiteLLMProvider that covers 100+ LLM providers through a single interface, including OpenAI, Anthropic, Google Vertex, AWS Bedrock, Azure OpenAI, Groq, Together AI, Fireworks, and any OpenAI-compatible endpoint.

Install the optional extra:

pip install zettelforge[litellm]

Then configure:

# OpenAI
llm:
  provider: litellm
  model: gpt-4o-mini
  api_key: ${OPENAI_API_KEY}

# Anthropic
llm:
  provider: litellm
  model: claude-sonnet-4-20250514
  api_key: ${ANTHROPIC_API_KEY}

# Azure OpenAI
llm:
  provider: litellm
  model: azure/gpt-4o
  api_key: ${AZURE_OPENAI_KEY}

# vLLM / any OpenAI-compatible server
llm:
  provider: litellm
  model: openai/Qwen/Qwen2.5-72B-Instruct
  url: http://gpu-box:8000/v1

LiteLLM routes to the correct provider SDK based on the model name prefix. See RFC-012 for full configuration and provider matrix.

What did not ship¶

Phase	Scope	Status
Phase 1	Provider infrastructure (`llm_providers/` package, registry, `local`/`ollama`/`mock` providers, expanded `LLMConfig`, env-ref resolution, entry-point discovery)	Shipped v2.3.0
Phase 2	`openai_compat` provider (httpx-based, covers OpenAI/vLLM/Groq etc.)	Superseded by RFC-012
Phase 3	`anthropic` native SDK provider	Superseded by RFC-012
Phase 4	Full env-var documentation, per-extra env resolution, third-party entry-point guide	Partially shipped with Phase 1; docs complete as of this page
Phase 5	Per-call `provider=` override on `generate()`, Azure OpenAI dedicated provider	Planned

Phase 5 (generate(prompt, provider="litellm")) would allow per-call provider selection — useful for routing synthesis calls to a powerful model while using a fast model for classification. It is backward compatible (defaults to None, falls back to global config). Not yet implemented.

Migration¶

Existing users with default config¶

No changes required. The registry wraps both local and ollama with identical behavior. The local → ollama implicit fallback is preserved.

Existing users with `provider: ollama` in config.yaml¶

No changes required. ollama is a built-in provider with identical behavior.

Existing env var users¶

All existing env vars continue to work. New env vars from Phase 1 are available immediately after upgrading to v2.3.0+.

Alternatives considered¶

LiteLLM as a core dependency (rejected for core, accepted as optional). LiteLLM provides a unified interface to 100+ providers but pulls in substantial transitive dependencies. Keeping it as zettelforge[litellm] avoids bloating the core install for offline users. RFC-012 confirms this tradeoff.

OpenAI SDK as universal client (rejected). The openai Python SDK covers OpenAI-compatible endpoints via base_url but adds a required dependency and doesn't cover Anthropic. RFC-012 LiteLLM achieves the same coverage without committing to the OpenAI SDK as a mandatory dependency.

Abstract base class instead of Protocol (rejected). Forces inheritance, inconsistent with the rest of the codebase, and unnecessary for a single-method interface.

Provider config as separate YAML sections (rejected). Proliferates top-level config sections. A single llm: section with a provider: discriminator is simpler and mirrors the existing embedding: pattern.

Decision¶

Accepted — 2026-04-16, Patrick Roland. Adversarial review completed with 3 blockers (all fixed), 7 warnings (6 addressed, Azure deferred), 5 nits (3 fixed). Open questions resolved: Azure deferred to RFC-012 LiteLLM, per-call provider override approved for Phase 5, streaming excluded, Retry-After support approved for Phase 2 (delivered via RFC-012 LiteLLM), ZETTELFORGE_OLLAMA_MODEL deprecated.