LLM Provider Integration¶

This document describes the LLM provider abstraction layer in sase. The system supports pluggable LLM backends (Claude Code, Codex, Gemini CLI, Qwen Code, and OpenCode are bundled; additional providers can ship as external plugins) behind a shared orchestration layer that handles preprocessing, invocation, and postprocessing.

Table of Contents¶

Overview
Provider Architecture
Claude Code Integration
Gemini CLI Integration
Codex CLI Integration
Qwen Code Integration
OpenCode Integration
External Provider Plugins
Configuration
Model Tier System
Temporary Default Override
Environment Variables
CLI Flags
Prompt Preprocessing Pipeline
Subprocess Streaming
Postprocessing
Chat History
Invocation Lifecycle

Overview¶

The LLM provider layer decouples prompt handling from the underlying LLM backend. All providers share a common preprocessing pipeline, subprocess streaming mechanism, and postprocessing workflow. The actual LLM invocation is delegated to a pluggable provider selected at runtime.

Key design principles:

Providers are thin: They only construct CLI commands and run subprocesses. All preprocessing and postprocessing lives in the shared orchestration layer.
Registry-based selection: Providers register themselves by name and are resolved via config or explicit override.
Tier-based model selection: Callers request a "large" or "small" tier; the provider maps it to a concrete model.

Source Layout¶

File	Purpose
`src/sase/llm_provider/__init__.py`	Public API exports
`src/sase/llm_provider/base.py`	`LLMProvider` abstract base class
`src/sase/llm_provider/_hookspec.py`	Pluggy hook specifications (`LLMHookSpec`)
`src/sase/llm_provider/_plugin_manager.py`	Plugin manager wrapping pluggy (`LLMPluginManager`)
`src/sase/llm_provider/claude.py`	Claude Code provider implementation
`src/sase/llm_provider/gemini.py`	Gemini CLI provider implementation
`src/sase/llm_provider/qwen.py`	Qwen Code provider implementation
`src/sase/llm_provider/opencode.py`	OpenCode provider implementation
`src/sase/llm_provider/registry.py`	Provider registration and lookup
`src/sase/llm_provider/config.py`	Config file reader (`sase.yml`)
`src/sase/llm_provider/types.py`	`ModelTier`, `LoggingContext` types
`src/sase/llm_provider/_invoke.py`	`invoke_agent()` orchestrator
`src/sase/llm_provider/_subprocess.py`	`stream_process_output()`
`src/sase/llm_provider/codex.py`	Codex CLI provider implementation
`src/sase/llm_provider/_plan_utils.py`	Shared plan utilities
`src/sase/llm_provider/preprocessing.py`	6-step preprocessing pipeline
`src/sase/llm_provider/postprocessing.py`	Logging, chat history, audio
`src/sase/llm_provider/retry_config.py`	`ProviderRetryConfig` (per-provider retry defaults)

Provider Architecture¶

Base Class¶

All providers implement the LLMProvider abstract base class:

class LLMProvider(ABC):
    @abstractmethod
    def invoke(
        self,
        prompt: str,
        *,
        model_tier: ModelTier,
        suppress_output: bool = False,
    ) -> str: ...

Parameter	Type	Description
`prompt`	`str`	Already-preprocessed prompt text
`model_tier`	`ModelTier`	`"large"` or `"small"`
`suppress_output`	`bool`	If `True`, suppress real-time console output

Returns the raw response text. Raises subprocess.CalledProcessError on failure.

Registry¶

Providers are registered by name in a global registry (registry.py). Built-in providers are auto-registered on module import:

Providers are discovered via importlib.metadata.entry_points(group="sase_llm"). Built-in entries live in pyproject.toml:

[project.entry-points."sase_llm"]
claude = "sase.llm_provider.claude:ClaudeCodeProvider"
codex  = "sase.llm_provider.codex:CodexProvider"
gemini = "sase.llm_provider.gemini:GeminiProvider"
opencode = "sase.llm_provider.opencode:OpenCodeProvider"
qwen   = "sase.llm_provider.qwen:QwenProvider"

External plugin packages declare additional entries under the same group.

To get a provider instance:

provider = get_provider()          # Uses default from config
provider = get_provider("claude")  # Explicit provider name

Selection Logic¶

If provider_name is passed to invoke_agent(), use that.
Otherwise, read the llm_provider.provider field from ~/.config/sase/sase.yml.
If no config exists (or provider is empty), auto-detect by walking registered plugins in ascending llm_autodetect_priority() order and picking the first whose llm_autodetect_cli_name() is on PATH. Built-in priorities: claude=0, codex=10, qwen=15, opencode=18, gemini=30. External plugins slot in by declaring their own priority.

Claude Code Integration¶

The ClaudeCodeProvider invokes the claude CLI tool.

Command Construction¶

claude -p --model <alias> --output-format text --dangerously-skip-permissions [extra_args...]

The prompt is written to stdin, and output is streamed from stdout in real-time.

Model Mapping¶

Tier	Claude CLI Alias
`large`	`opus`
`small`	`sonnet`

Environment Variables¶

Variable	Description
`SASE_LLM_LARGE_ARGS`	Extra CLI args for `large` tier (generic, preferred)
`SASE_LLM_SMALL_ARGS`	Extra CLI args for `small` tier (generic, preferred)
`SASE_CLAUDE_LARGE_ARGS`	Extra CLI args for `large` tier (Claude-specific fallback)
`SASE_CLAUDE_SMALL_ARGS`	Extra CLI args for `small` tier (Claude-specific fallback)

The generic SASE_LLM_*_ARGS variables take precedence. Values are split on whitespace and appended to the command.

Timer Display¶

While waiting for a response, a gemini_timer("Waiting for Claude") spinner is shown (unless suppress_output is True).

Gemini CLI Integration¶

The GeminiProvider invokes Google's internal Gemini CLI tool.

Command Construction¶

gemini --yolo [extra_args...]

The prompt is written to stdin, and output is streamed from stdout in real-time.

Default Model¶

The Gemini provider uses gemini-3-flash-preview as its default model. This can be overridden per-prompt using the %model directive (e.g., %model:gemini-2.5-flash).

Environment Variables¶

Variable	Description
`SASE_GEMINI_PATH`	Path to the Gemini CLI binary (default: `"gemini"`).

Timer Display¶

While waiting for a response, a gemini_timer("Waiting for Gemini") spinner is shown (unless suppress_output is True).

Codex CLI Integration¶

The CodexProvider invokes the OpenAI codex CLI tool.

Command Construction¶

Normal mode:

codex exec --model <model> --dangerously-bypass-approvals-and-sandbox --json --color never --skip-git-repo-check - [extra_args...]

The prompt is written to stdin. Output is streamed as NDJSON events, with assistant text extracted from item.completed events.

Model Mapping¶

Tier	Codex Model
`large`	`gpt-5.5`
`small`	`codex-mini-latest`

Plan Mode¶

When SASE_AGENT_PLAN_MODE is set, Codex runs a two-phase plan/implement flow:

Phase 1 (Planning): Runs with --sandbox read-only and --ask-for-approval on-request. The model generates a plan captured via --output-last-message, on-disk plan files, or streamed response text.
Approval: The plan is presented for user approval with up to 5 feedback-retry rounds.
Phase 2 (Implementation): On approval, runs with full permissions (--dangerously-bypass-approvals-and-sandbox) using the plan content as the prompt.

Environment Variables¶

Variable	Description
`SASE_LLM_LARGE_ARGS`	Extra CLI args for `large` tier (generic, preferred)
`SASE_LLM_SMALL_ARGS`	Extra CLI args for `small` tier (generic, preferred)
`SASE_CODEX_PATH`	Path to the Codex CLI binary (default: PATH, then NVM_BIN)
`SASE_CODEX_LARGE_ARGS`	Extra CLI args for `large` tier (Codex-specific fallback)
`SASE_CODEX_SMALL_ARGS`	Extra CLI args for `small` tier (Codex-specific fallback)
`SASE_CODEX_DISABLE_SHADOW_HOME`	Set to `1` to disable the disposable Codex home
`SASE_AGENT_PLAN_MODE`	Enable two-phase plan/implement flow

The generic SASE_LLM_*_ARGS variables take precedence over SASE_CODEX_*_ARGS.

By default, SASE launches Codex with a per-invocation shadow CODEX_HOME under ~/.cache/sase/codex_home/. The shadow home copies config.toml and symlinks other Codex home entries back to the real Codex home so Codex can read auth, hooks, skills, logs, and caches while any config rewrites stay disposable. The shadow directory is removed after each Codex subprocess exits. Set SASE_CODEX_DISABLE_SHADOW_HOME=1 to pass through the inherited environment directly for debugging or emergency compatibility.

Timer Display¶

While waiting for a response, a gemini_timer("Waiting for Codex") spinner is shown (unless suppress_output is True). In plan mode, the timer reads "Waiting for Codex (planning)" during Phase 1 and "Implementing plan" during Phase 2.

Qwen Code Integration¶

The QwenProvider invokes the qwen CLI tool.

Command Construction¶

qwen --input-format text --output-format stream-json --yolo --model <model> [extra_args...]

The prompt is written to stdin using Qwen's text input mode. Output is streamed as JSON events; SASE extracts assistant text from assistant events and falls back to the final result text when no assistant text is emitted.

Model Mapping¶

Tier	Qwen Model
`large`	`qwen3-coder-plus`
`small`	`qwen3-coder-flash`

Authentication¶

Configure Qwen Code through its supported auth and settings flow before using it from SASE. Qwen OAuth free tier access ended on 2026-04-15; use API keys, Alibaba Cloud Coding Plan, OpenRouter, Fireworks, or another Qwen-supported provider instead of relying on the discontinued OAuth free tier.

Environment Variables¶

Variable	Description
`SASE_LLM_LARGE_ARGS`	Extra CLI args for `large` tier (generic, preferred)
`SASE_LLM_SMALL_ARGS`	Extra CLI args for `small` tier (generic, preferred)
`SASE_QWEN_PATH`	Path to the Qwen Code CLI binary (default: `qwen`)
`SASE_QWEN_LARGE_ARGS`	Extra CLI args for `large` tier (Qwen-specific fallback)
`SASE_QWEN_SMALL_ARGS`	Extra CLI args for `small` tier (Qwen-specific fallback)

The generic SASE_LLM_*_ARGS variables take precedence over SASE_QWEN_*_ARGS.

Qwen Code config is left in Qwen's normal locations (~/.qwen/settings.json and project .qwen/settings.json). SASE does not create a shadow Qwen home in the first implementation because local Qwen was unavailable during this phase, so no normal headless-run config mutation could be verified.

Timer Display¶

While waiting for a response, a gemini_timer("Waiting for Qwen") spinner is shown (unless suppress_output is True).

OpenCode Integration¶

The OpenCodeProvider invokes the opencode CLI tool.

Command Construction¶

opencode run --format json --dangerously-skip-permissions --model <provider/model> --dir <cwd> [extra_args...] <prompt>

The prompt is passed as OpenCode's run [message..] argument without shell interpolation. Output is streamed as JSONL events; SASE extracts assistant text from text events, captures errors from error events, and accumulates token counters from step_finish events when OpenCode reports them.

Model Mapping¶

OpenCode model IDs normally include an upstream provider prefix. Use %model:opencode/<provider/model> to route a single SASE prompt to a concrete OpenCode model.

Tier	OpenCode Model
`large`	`anthropic/claude-sonnet-4-5`
`small`	`openai/gpt-5-mini`

Authentication and Config¶

Configure OpenCode through its normal auth and settings flow before using it from SASE. OpenCode stores auth under its XDG data directory and reads config from its XDG config directory plus project .opencode config. Use opencode models to inspect the models available in your configured OpenCode environment.

SASE deploys OpenCode skills under ~/.config/opencode/skills/, which OpenCode scans as part of its config directory. SASE does not create a shadow OpenCode data/config home in this first implementation because OpenCode's normal headless run writes session/database state under its XDG data directory while reading auth/config from the standard locations.

Environment Variables¶

Variable	Description
`SASE_LLM_LARGE_ARGS`	Extra CLI args for `large` tier (generic, preferred)
`SASE_LLM_SMALL_ARGS`	Extra CLI args for `small` tier (generic, preferred)
`SASE_OPENCODE_PATH`	Path to the OpenCode CLI binary (default: `opencode`)
`SASE_OPENCODE_LARGE_ARGS`	Extra CLI args for `large` tier (OpenCode-specific fallback)
`SASE_OPENCODE_SMALL_ARGS`	Extra CLI args for `small` tier (OpenCode-specific fallback)

The generic SASE_LLM_*_ARGS variables take precedence over SASE_OPENCODE_*_ARGS.

Timer Display¶

While waiting for a response, a gemini_timer("Waiting for OpenCode") spinner is shown (unless suppress_output is True).

External Provider Plugins¶

Additional LLM providers are shipped as external packages that declare [project.entry-points."sase_llm"] in their own pyproject.toml. Plugins carry all their own metadata (model names, skill deploy path, CLI status color, auto-detect priority, retry defaults) via pluggy @hookimpl methods — sase core has no plugin-specific branching.

External provider packages own their CLI invocation details, model metadata, skill deployment path, auto-detect priority, and retry defaults. Install the provider package in the same environment as sase to make its sase_llm entry point available.

Configuration¶

The LLM provider reads its configuration from ~/.config/sase/sase.yml under the llm_provider key.

Config File¶

llm_provider:
  provider: claude # or "qwen", "opencode", "gemini" (default: auto-detect)
  model_tier_map:
    large: opus
    small: sonnet

Config Fields¶

Field	Type	Default	Description
`llm_provider.provider`	string	auto-detect	Which registered provider to use. Auto-detects by plugin-declared priority; built-ins default to claude → codex → qwen → opencode → gemini.
`llm_provider.model_tier_map.large`	string	-	Model identifier for the `large` tier
`llm_provider.model_tier_map.small`	string	-	Model identifier for the `small` tier

Per-Prompt Provider Switching¶

The %model directive (see xprompt directives) can switch both the model and the LLM provider for a single prompt. Provider resolution uses two strategies:

Explicit Provider/Model Syntax¶

Use provider/model to specify both explicitly:

%model:codex/o3
%model:claude/opus
%model:gemini/gemini-2.5-pro
%model:qwen/qwen3-coder-plus
%model:opencode/anthropic/claude-sonnet-4-5

Automatic Provider Resolution¶

Known model names are automatically mapped to their provider:

Model Name	Provider
`opus`, `sonnet`, `haiku`	claude
`gpt-5.5`, `gpt-5.3-codex`, `codex-mini-latest`, `o3`, `o4-mini`, `gpt-5.4`, `gpt-4.1`, `gpt-4.1-mini`, `gpt-4o`, `gpt-4o-mini`	codex
`gemini-2.5-pro`, `gemini-2.5-flash`, `gemini-3.1-pro-preview`, `gemini-3-flash-preview`, `gemini-2.0-flash`	gemini
`qwen3-coder-plus`, `qwen3-coder-flash`, `qwen3-max`, `qwen-plus`, `qwen-max`	qwen
`anthropic/claude-sonnet-4-5`, `openai/gpt-5-mini`, `qwen/qwen3-coder-plus`	opencode

Each installed plugin contributes its own model names via the llm_known_model_names() hook.

For unrecognized model names, the default provider is used.

Source: src/sase/llm_provider/registry.py

Model Tier System¶

The model tier system abstracts away specific model names. Callers request either "large" (most capable) or "small" (faster/cheaper), and the provider maps the tier to a concrete model.

Type Definition¶

ModelTier = Literal["large", "small"]

Legacy Mapping¶

The old "big"/"little" terminology is still supported for backward compatibility:

Old Value	New Tier	Display Label
`"big"`	`"large"`	`BIG`
`"little"`	`"small"`	`LITTLE`

The model_size parameter on invoke_agent() is deprecated. Use model_tier instead.

Global Override¶

The model tier can be overridden globally via environment variable or CLI flag. The override forces ALL invocations to use the specified tier regardless of what the caller requests.

Resolution order:

SASE_MODEL_TIER_OVERRIDE env var (accepts "large", "small", "big", "little")
SASE_MODEL_SIZE_OVERRIDE env var (legacy, same values)
--model-tier / --model-size CLI flag (sets the env var)
Caller's model_tier parameter (default: "large")

Temporary Default Override¶

In addition to the tier-based global override, sase supports a concrete provider/model override that acts as a temporary session-level default. This is the override the ACE ,P chord writes (see docs/ace.md for the TUI flow).

The temporary override only changes the default provider/model selection for new agent launches. It does not affect:

Already-running agents — they keep whatever provider/model they were launched with.
Explicit %model prompt directives — they still take precedence.
An explicit provider_name= argument to invoke_agent() — it still wins.
SASE_MODEL_TIER_OVERRIDE / SASE_MODEL_SIZE_OVERRIDE — those force a tier across all invocations regardless of this override; they layer on top, not under.

Resolution Order (default provider/model)¶

When no %model directive and no explicit provider_name are present, the default is resolved as:

Active temporary override at ~/.sase/llm_override.json (if not expired).
llm_provider.provider from the merged sase.yml config.
Auto-detection by plugin-declared priority (built-ins: claude, codex, qwen, opencode, then gemini).

A concrete temporary override sets both the default provider and a concrete model_override for the next launch — so the agent metadata (running marker, plan review badge, agent rows) reflects the actual model that will run, not just the configured default.

State File¶

{
  "provider": "opencode",
  "model": "anthropic/claude-sonnet-4-5",
  "raw_model": "opencode/anthropic/claude-sonnet-4-5",
  "created_at": 1777470000.0,
  "expires_at": 1777473600.0,
  "source": "ace"
}

Field	Type	Description
`provider`	`str`	Resolved provider name (e.g. `"claude"`, `"codex"`, `"opencode"`).
`model`	`str`	Concrete model passed to the provider (e.g. `"o3"`, `"opus"`).
`raw_model`	`str`	Original user input (e.g. `"codex/o3"`, `"opencode/anthropic/..."`).
`created_at`	`float`	Unix timestamp when the override was set.
`expires_at`	`float \\| None`	Unix timestamp when the override expires; `null` means "until cleared".
`source`	`str`	Free-form tag indicating who set the override (e.g. `"ace"`).

Writes are atomic (temp file + os.replace). Reads are best-effort self-cleaning: an expired or unparseable file is deleted on next access, so a forgotten override never lingers past its expires_at, even with no TUI running.

Model Resolution¶

The user-supplied raw_model is normalized through the same rules as %model:

provider/model selects the provider explicitly (e.g. codex/o3 or opencode/anthropic/claude-sonnet-4-5).
A bare known model name infers its provider from plugin metadata (e.g. sonnet → claude).
An unknown bare model is accepted and runs on the current default provider, matching %model behavior.

Duration Parsing¶

Durations accept compact unit suffixes: 15m, 1h, 1h30m, 90m, 2h15m30s. Bare integers are interpreted as minutes (45 → 45 minutes). The case-insensitive sentinel until cleared (or until_cleared) means "no expiry — persists until the user clears it from the TUI or another sase process clears the state file."

Public API¶

The override primitives live in src/sase/llm_provider/temporary_override.py:

Function	Purpose
`get_active_temporary_override(now=None)`	Read the active override (auto-deletes expired/malformed files).
`set_temporary_override(raw, dur, source=)`	Write a new override, replacing any existing one.
`clear_temporary_override()`	Remove the override file. Safe to call when nothing is active.
`parse_override_duration(value)`	Parse a user-facing duration string into seconds (or `None`).
`resolve_effective_default_provider_model()`	Centralized helper used by metadata pre-resolution paths.

Examples¶

ACE chord ,P, pick codex/o3, duration 1h → ~/.sase/llm_override.json is written; new launches default to CODEX(o3) for the next hour.
ACE chord ,P, pick opencode/anthropic/claude-sonnet-4-5, duration 1h → new launches default to OPENCODE(anthropic/claude-sonnet-4-5).
ACE chord ,P, pick sonnet, duration 30m → known bare model; provider resolves to claude via plugin metadata.
ACE chord ,P, choose Clear override → ~/.sase/llm_override.json is removed; defaults revert to permanent config / autodetect.

Environment Variables¶

Complete reference of environment variables used by the LLM provider layer.

Generic (Provider-Agnostic)¶

Variable	Description
`SASE_LLM_LARGE_ARGS`	Extra CLI args for `large` tier invocations
`SASE_LLM_SMALL_ARGS`	Extra CLI args for `small` tier invocations
`SASE_MODEL_TIER_OVERRIDE`	Force all invocations to a specific model tier
`SASE_MODEL_SIZE_OVERRIDE`	Legacy alias for `SASE_MODEL_TIER_OVERRIDE`

Claude-Specific¶

Variable	Description
`SASE_CLAUDE_LARGE_ARGS`	Claude-specific extra args for `large` tier
`SASE_CLAUDE_SMALL_ARGS`	Claude-specific extra args for `small` tier

Codex-Specific¶

Variable	Description
`SASE_CODEX_LARGE_ARGS`	Codex-specific extra args for `large` tier
`SASE_CODEX_SMALL_ARGS`	Codex-specific extra args for `small` tier
`SASE_AGENT_PLAN_MODE`	Enable Codex two-phase plan/implement flow

Qwen-Specific¶

Variable	Description
`SASE_QWEN_PATH`	Path to the Qwen Code CLI binary
`SASE_QWEN_LARGE_ARGS`	Qwen-specific extra args for `large` tier
`SASE_QWEN_SMALL_ARGS`	Qwen-specific extra args for `small` tier

Gemini-Specific¶

Variable	Description
`SASE_GEMINI_PATH`	Path to the Gemini CLI binary (default: `"gemini"`).

External provider plugins document their own environment variables in their respective repos.

VCS Provider¶

Variable	Description
`SASE_VCS_PROVIDER`	Override VCS provider (`"git"`, `"hg"`, or `"auto"`)

CLI Flags¶

ace¶

Flag	Values	Description
`-m, --model-tier`	`large`, `small`	Override model tier for all LLM invocations
`--model-size`	`big`, `little`	Deprecated alias for `--model-tier`
`--vcs-provider`	`git`, `hg`, `auto`	Override VCS provider

axe¶

Flag	Values	Description
`--vcs-provider`	`git`, `hg`, `auto`	Override VCS provider

The ace command wires --model-tier / --model-size into the model_tier_override parameter of AceApp. The --vcs-provider flag is wired to the SASE_VCS_PROVIDER environment variable for downstream resolution.

Retry and Fallback¶

The LLM provider layer supports per-provider retry and fallback configuration. When an agent encounters a retryable error, it can automatically wait and retry, then optionally fall back to an alternate model.

Configuration¶

Retry behavior is configured per provider under llm_provider.retry in sase.yml:

llm_provider:
  retry:
    gemini:
      max_retries: 3
      error_patterns:
        - "An unexpected critical error occurred:"
      wait_times: [60, 300, 1800]
      fallback_model: "gemini-3-flash-preview"

Config Fields¶

Field	Type	Default	Description
`max_retries`	int	`0`	Maximum retry attempts. `0` disables retrying.
`error_patterns`	list[str]	`[]`	Case-insensitive substring patterns matched against error output.
`wait_times`	list[int]	`[30]`	Per-retry wait times in seconds. Last value reused if list is too short.
`fallback_model`	str\|null	`null`	Alternate model to use after exhausting all retries.
`continuation_prompt`	str	`""`	Text prepended to `state.current_prompt` on every retry (used to nudge the agent).
`spawn_new_agent`	bool	`false`	Opt in to spawn-on-retry: a retryable error spawns a fresh detached child agent (as if `sase run -d` had been invoked) instead of in-process retry. See Spawn-on-Retry below.

Default Configuration¶

Gemini and Claude have retry defaults (defined in default_config.yml); external provider plugins may declare their own via the llm_default_retry_config() hook.

Gemini:

max_retries: 3
error_patterns: ["An unexpected critical error occurred:"]
wait_times: [60, 300, 1800] (1 min, 5 min, 30 min)
fallback_model: "gemini-3-flash-preview"

Claude:

max_retries: 3
error_patterns: ["API Error: 500", "API Error: 529", "Internal server error", "overloaded_error"]
wait_times: [60, 300, 1800] (1 min, 5 min, 30 min)
fallback_model: "sonnet"

Built-In "Prompt is too long" Recovery (Claude)¶

Claude has an additional built-in retry entry registered internally (not in default_config.yml) that auto-recovers agents from context-overflow errors without any user config:

error pattern: "Prompt is too long"
max_retries: 3
wait_times: [0] — zero-delay retry so a fresh session restarts immediately
continuation_prompt: A short nudge that tells the coder to inspect git status / git diff before resuming, since prior edits are preserved on disk when the retry wipes only the in-memory context

User-supplied llm_provider.retry.claude config is merged on top of these built-ins: explicit falsy values (max_retries: 0 to opt out entirely, continuation_prompt: "" to disable the nudge) override the built-in via key-presence checks. error_patterns is a de-duplicated union of built-in and user lists.

On every retry attempt the continuation_prompt (if non-empty) is idempotently prepended to state.current_prompt before the next invocation — the prepend is gated on a startswith check so repeated retries don't stack duplicate nudges. Workspaces are preserved across built-in context-overflow retries (no workspace wipe), so on-disk edits remain available to the restarted session.

Retry Flow¶

Error detected
│
├── Does error match error_patterns? (case-insensitive substring)
│   ├── No  → fail immediately
│   └── Yes → retry_count < max_retries?
│       ├── Yes → wait (wait_times[retry_count]) → retry
│       └── No  → fallback_model configured?
│           ├── Yes → switch model via SASE_MODEL_OVERRIDE → retry once
│           └── No  → fail

Wait periods are interruptible — if the agent is killed during a wait, it stops immediately.

TUI Display¶

The ACE Agents tab reflects retry state (see Retry/Fallback Display):

RETRYING (Ns) — Waiting before the next attempt (bold orange, with countdown)
↻N — Retry count annotation on running agents
▸Model — Fallback model annotation (e.g., ↻3▸flash)

Metadata Tracking¶

After execution completes, retry metadata is written to done.json in the agent's artifacts directory:

{
  "retry_count": 2,
  "retry_errors": ["An unexpected critical error occurred: ..."],
  "used_fallback": false
}

Source: src/sase/llm_provider/retry_config.py, src/sase/axe/run_agent_exec.py

Spawn-on-Retry¶

When ProviderRetryConfig.spawn_new_agent=True, a retryable error spawns a fresh detached child agent (as if sase run -d had been invoked) instead of running the next attempt in-process. The failing parent transfers its workspace claim to the child via transfer_workspace_claim() and exits with status FAILED (RETRIED). This trades the small cost of a fresh process for two benefits:

The workspace is preserved by design — the child skips prepare_workspace() and inherits the parent's in-progress edits via the transferred workspace claim. (Legacy in-process retry runs prepare_workspace() between attempts and wipes uncommitted file edits unless preserve_workspace=True.)
A retry boundary becomes a real process boundary, which is more robust against memory leaks, lingering child processes, and stale interpreter state.

Linkage fields (written to both agent_meta.json and done.json so retry chains are queryable from either side):

Field	Meaning
`retry_of_timestamp`	Backward link: the parent agent's run timestamp.
`retried_as_timestamp`	Forward link: the child agent's run timestamp (written on the parent at handoff).
`retry_chain_root_timestamp`	The root agent's timestamp — stable across the entire chain.
`retry_attempt`	Depth in the chain (1-based).

State is carried across the boundary by a retry_handoff.json file written to the parent's artifacts directory; the child reads it before launch.

Fallback behavior: spawn-on-retry is opt-in (default false). If spawning fails (e.g. workspace transfer fails), the legacy in-process retry runs as a fallback so the user is never worse off.

Source: src/sase/axe/run_agent_retry_spawn.py, src/sase/llm_provider/retry_config.py

Token Usage Tracking¶

The LLM provider layer tracks token usage for Claude Code agent runs. Input tokens, output tokens, and cache-read tokens are extracted from the Claude Code stream-json result events and persisted as a usage.json artifact in the agent run directory.

Artifact Format¶

{
  "input_tokens": 12345,
  "output_tokens": 6789,
  "cache_read_tokens": 3456
}

When telemetry is enabled, token counts are also recorded as Prometheus counters (sase_llm_input_tokens_total, sase_llm_output_tokens_total, sase_llm_cache_read_tokens_total) for monitoring and dashboards. See docs/telemetry.md for the full telemetry reference.

Source: src/sase/llm_provider/_subprocess.py, src/sase/llm_provider/types.py

Prompt Preprocessing Pipeline¶

Before any prompt reaches a provider, it passes through a 6-step preprocessing pipeline defined in preprocessing.py.

Steps¶

#	Step	Syntax	Description
1	xprompt references	`#name`	Expand reusable inline prompt snippets from xprompts
2	Command substitution	`$(cmd)`	Execute shell commands and inline their output
3	File references	`@path`	Inline file contents (copy absolute/tilde paths)
4	Jinja2 rendering	`{{ var }}`	Render Jinja2 templates after all prior expansions
5	Prettier formatting	-	Format with prettier for consistent markdown
6	Comment stripping	`<!-- ... -->`	Remove HTML/markdown comments

Order Matters¶

The pipeline runs in strict order. Jinja2 rendering (step 4) happens after xprompt, command substitution, and file reference expansion, so templates can reference content injected by earlier steps.

Home Mode¶

When is_home_mode=True, file reference processing skips copying files (step 3). This is used when the invocation doesn't need side effects from @path references.

Source Functions¶

The preprocessing steps delegate to functions from two libraries:

xprompt: process_xprompt_references(), is_jinja2_template(), render_toplevel_jinja2()
gemini_wrapper.file_references: process_command_substitution(), process_file_references(), format_with_prettier(), strip_html_comments()

Subprocess Streaming¶

Providers use shared helpers in _subprocess.py to stream LLM output in real-time.

Mechanism¶

The provider spawns the CLI tool via subprocess.Popen with stdout=PIPE and stderr=PIPE; providers that consume prompts from stdin also set stdin=PIPE.
The prompt is supplied using the provider's documented transport, either stdin or an argv message argument.
Both stdout and stderr file descriptors are set to non-blocking mode via os.set_blocking().
A select.select() loop with a 0.1s timeout polls for readable data on both streams.
Lines are read and optionally printed to the console in real-time.
After the process exits (process.poll() is not None), any remaining buffered output is drained.
The function returns (stdout_content, stderr_content, return_code).

Live Reply File¶

When SASE_ARTIFACTS_DIR is set, the streaming output is also written in real-time to <SASE_ARTIFACTS_DIR>/live_reply.md. This file is used by the ACE TUI Agents tab to display the agent's reply as it streams in, and remains available after execution completes for the metadata panel's AGENT REPLY section.

Output Suppression¶

When suppress_output=True, lines are still captured but not printed to the console. This is used for background invocations where the caller only needs the final result.

Postprocessing¶

After a provider returns (or raises an error), the orchestration layer runs postprocessing steps.

On Success (`postprocess_success`)¶

Audio notification: Plays a sound via run_bam_command("Agent reply received") (skipped if suppress_output).
Log to sase.md: Appends a timestamped entry with the prompt and response to <artifacts_dir>/sase.md (if artifacts_dir is set).
Save chat history: Writes to ~/.sase/chats/ if workflow is set. See Chat History.

On Error (`postprocess_error`)¶

Rich error display: Prints the prompt and error via print_prompt_and_response() with an _ERROR suffix on the agent type label (skipped if suppress_output).
Log to sase.md: Same as success, but the response is the error message and the agent type gets an _ERROR suffix.
Save error chat history: Writes to ~/.sase/chats/ with an _ERROR agent suffix.

sase.md Log Format¶

Each entry in the log file follows this format:

## <timestamp> - <agent_type> - iteration <N> - tag <workflow_tag>

### PROMPT:

\`\`\` <prompt text> \`\`\`

### RESPONSE:

\`\`\` <response text> \`\`\`

---

Prompt File Saving¶

Before invocation, the preprocessed prompt is saved to <artifacts_dir>/<agent_type>_prompt.md (or <agent_type>_iter_<N>_prompt.md if an iteration number is set). This allows reviewing the exact prompt that was sent.

Chat History¶

Chat histories are stored as markdown files in ~/.sase/chats/.

File Naming¶

<branch_or_workspace>-<workflow>-[<agent>-]<timestamp>.md

Part	Source	Example
`branch_or_workspace`	Output of `branch_or_workspace_name`	`my_feature`
`workflow`	Workflow name, normalized	`crs`, `run`
`agent`	Agent type (omitted if same as workflow)	`editor`, `planner`
`timestamp`	`YYmmdd_HHMMSS` format	`260214_153042`

Dashes and slashes in workflow names are normalized to underscores.

File Format¶

# Chat History - <workflow> (<agent>)

**Timestamp:** <display_timestamp>

## Previous Conversation

<previous history if resuming>

---

## Prompt

<prompt text>

## Response

<response text>

Resume Support¶

The sase run --resume flag resumes a previous conversation by agent name. The #resume workflow resolves the agent name to its artifacts directory, extracts the response path from done.json, and delegates to #resume_by_chat which loads the chat history and prepends it to the new conversation. The --resume flag also accepts a history file basename or full path for direct chat-file-based resumption via the #resume_by_chat workflow.

Resume expansion is recursive: if the loaded chat history itself contains #resume or #resume_by_chat references, those are expanded inline as well. Cycle detection prevents infinite loops when chat histories reference each other.

Invocation Lifecycle¶

The invoke_agent() function in _invoke.py orchestrates the complete lifecycle of an LLM invocation. Here is the end-to-end flow:

invoke_agent(prompt, agent_type, model_tier, ...)
│
├── 1. Handle deprecated model_size → model_tier mapping
├── 2. Check SASE_MODEL_TIER_OVERRIDE / SASE_MODEL_SIZE_OVERRIDE env vars
├── 3. Build LoggingContext from parameters
│
├── 4. Preprocess prompt (6-step pipeline)
│   ├── xprompt references (#name)
│   ├── Command substitution ($(cmd))
│   ├── File references (@path)
│   ├── Jinja2 rendering ({{ var }})
│   ├── Prettier formatting
│   └── Comment stripping
│
├── 5. Display decision counts (if not suppressed)
├── 6. Print prompt via Rich (if not suppressed)
├── 7. Generate or use provided timestamp
├── 8. Save prompt to artifacts directory
│
├── 9. Get provider from registry and invoke
│   ├── Build CLI command with flags
│   ├── Spawn subprocess (Popen)
│   ├── Supply prompt via provider transport
│   └── Stream stdout/stderr in real-time
│
├── 10. Postprocess
│   ├── Success path:
│   │   ├── Audio notification
│   │   ├── Log to sase.md
│   │   └── Save chat history
│   └── Error path:
│       ├── Rich error display
│       ├── Log error to sase.md
│       └── Save error chat history
│
└── 11. Return AIMessage(content=response)

Parameters¶

Parameter	Type	Default	Description
`prompt`	`str`	(required)	Raw prompt to send
`agent_type`	`str`	(required)	Agent type label (e.g., `"editor"`)
`model_tier`	`ModelTier`	`"large"`	Model tier to use
`model_size`	`"big" \\| "little" \\| None`	`None`	Deprecated, use `model_tier`
`iteration`	`int \\| None`	`None`	Iteration number for logging
`workflow_tag`	`str \\| None`	`None`	Workflow tag for logging
`artifacts_dir`	`str \\| None`	`None`	Directory for sase.md and prompt files
`workflow`	`str \\| None`	`None`	Workflow name for chat history
`suppress_output`	`bool`	`False`	Suppress console output
`timestamp`	`str \\| None`	`None`	Shared timestamp (`YYmmdd_HHMMSS`)
`is_home_mode`	`bool`	`False`	Skip file copying for `@` references
`decision_counts`	`dict[str, Any] \\| None`	`None`	Planning agent decision counts
`provider_name`	`str \\| None`	`None`	Override provider (default from config)

Return Value¶

Always returns an AIMessage (from langchain_core.messages). On error, the content field contains the error message rather than a response.