LLM Provider Integration¶
This document describes the LLM provider abstraction layer in sase. The system supports pluggable LLM backends (Claude Code, Codex, Gemini CLI, Qwen Code, and OpenCode are bundled; additional providers can ship as external plugins) behind a shared orchestration layer that handles preprocessing, invocation, and postprocessing.
Table of Contents¶
- Overview
- Provider Architecture
- Claude Code Integration
- Gemini CLI Integration
- Codex CLI Integration
- Qwen Code Integration
- OpenCode Integration
- External Provider Plugins
- Configuration
- Model Tier System
- Temporary Default Override
- Environment Variables
- CLI Flags
- Prompt Preprocessing Pipeline
- Subprocess Streaming
- Postprocessing
- Chat History
- Invocation Lifecycle
Overview¶
The LLM provider layer decouples prompt handling from the underlying LLM backend. All providers share a common preprocessing pipeline, subprocess streaming mechanism, and postprocessing workflow. The actual LLM invocation is delegated to a pluggable provider selected at runtime.
Key design principles:
- Providers are thin: They only construct CLI commands and run subprocesses. All preprocessing and postprocessing lives in the shared orchestration layer.
- Registry-based selection: Providers register themselves by name and are resolved via config or explicit override.
- Tier-based model selection: Callers request a "large" or "small" tier; the provider maps it to a concrete model.
Source Layout¶
| File | Purpose |
|---|---|
src/sase/llm_provider/__init__.py |
Public API exports |
src/sase/llm_provider/base.py |
LLMProvider abstract base class |
src/sase/llm_provider/_hookspec.py |
Pluggy hook specifications (LLMHookSpec) |
src/sase/llm_provider/_plugin_manager.py |
Plugin manager wrapping pluggy (LLMPluginManager) |
src/sase/llm_provider/claude.py |
Claude Code provider implementation |
src/sase/llm_provider/gemini.py |
Gemini CLI provider implementation |
src/sase/llm_provider/qwen.py |
Qwen Code provider implementation |
src/sase/llm_provider/opencode.py |
OpenCode provider implementation |
src/sase/llm_provider/registry.py |
Provider registration and lookup |
src/sase/llm_provider/config.py |
Config file reader (sase.yml) |
src/sase/llm_provider/types.py |
ModelTier, LoggingContext types |
src/sase/llm_provider/_invoke.py |
invoke_agent() orchestrator |
src/sase/llm_provider/_subprocess.py |
stream_process_output() |
src/sase/llm_provider/codex.py |
Codex CLI provider implementation |
src/sase/llm_provider/_plan_utils.py |
Shared plan utilities |
src/sase/llm_provider/preprocessing.py |
6-step preprocessing pipeline |
src/sase/llm_provider/postprocessing.py |
Logging, chat history, audio |
src/sase/llm_provider/retry_config.py |
ProviderRetryConfig (per-provider retry defaults) |
Provider Architecture¶
Base Class¶
All providers implement the LLMProvider abstract base class:
class LLMProvider(ABC):
@abstractmethod
def invoke(
self,
prompt: str,
*,
model_tier: ModelTier,
suppress_output: bool = False,
) -> str: ...
| Parameter | Type | Description |
|---|---|---|
prompt |
str |
Already-preprocessed prompt text |
model_tier |
ModelTier |
"large" or "small" |
suppress_output |
bool |
If True, suppress real-time console output |
Returns the raw response text. Raises subprocess.CalledProcessError on failure.
Registry¶
Providers are registered by name in a global registry (registry.py). Built-in providers are auto-registered on module
import:
Providers are discovered via importlib.metadata.entry_points(group="sase_llm"). Built-in entries live in
pyproject.toml:
[project.entry-points."sase_llm"]
claude = "sase.llm_provider.claude:ClaudeCodeProvider"
codex = "sase.llm_provider.codex:CodexProvider"
gemini = "sase.llm_provider.gemini:GeminiProvider"
opencode = "sase.llm_provider.opencode:OpenCodeProvider"
qwen = "sase.llm_provider.qwen:QwenProvider"
External plugin packages declare additional entries under the same group.
To get a provider instance:
provider = get_provider() # Uses default from config
provider = get_provider("claude") # Explicit provider name
Selection Logic¶
- If
provider_nameis passed toinvoke_agent(), use that. - Otherwise, read the
llm_provider.providerfield from~/.config/sase/sase.yml. - If no config exists (or provider is empty), auto-detect by walking registered plugins in ascending
llm_autodetect_priority()order and picking the first whosellm_autodetect_cli_name()is onPATH. Built-in priorities:claude=0,codex=10,qwen=15,opencode=18,gemini=30. External plugins slot in by declaring their own priority.
Claude Code Integration¶
The ClaudeCodeProvider invokes the claude CLI tool.
Command Construction¶
claude -p --model <alias> --output-format text --dangerously-skip-permissions [extra_args...]
The prompt is written to stdin, and output is streamed from stdout in real-time.
Model Mapping¶
| Tier | Claude CLI Alias |
|---|---|
large |
opus |
small |
sonnet |
Environment Variables¶
| Variable | Description |
|---|---|
SASE_LLM_LARGE_ARGS |
Extra CLI args for large tier (generic, preferred) |
SASE_LLM_SMALL_ARGS |
Extra CLI args for small tier (generic, preferred) |
SASE_CLAUDE_LARGE_ARGS |
Extra CLI args for large tier (Claude-specific fallback) |
SASE_CLAUDE_SMALL_ARGS |
Extra CLI args for small tier (Claude-specific fallback) |
The generic SASE_LLM_*_ARGS variables take precedence. Values are split on whitespace and appended to the command.
Timer Display¶
While waiting for a response, a gemini_timer("Waiting for Claude") spinner is shown (unless suppress_output is
True).
Gemini CLI Integration¶
The GeminiProvider invokes Google's internal Gemini CLI tool.
Command Construction¶
gemini --yolo [extra_args...]
The prompt is written to stdin, and output is streamed from stdout in real-time.
Default Model¶
The Gemini provider uses gemini-3-flash-preview as its default model. This can be overridden per-prompt using the
%model directive (e.g., %model:gemini-2.5-flash).
Environment Variables¶
| Variable | Description |
|---|---|
SASE_GEMINI_PATH |
Path to the Gemini CLI binary (default: "gemini"). |
Timer Display¶
While waiting for a response, a gemini_timer("Waiting for Gemini") spinner is shown (unless suppress_output is
True).
Codex CLI Integration¶
The CodexProvider invokes the OpenAI codex CLI tool.
Command Construction¶
Normal mode:
codex exec --model <model> --dangerously-bypass-approvals-and-sandbox --json --color never --skip-git-repo-check - [extra_args...]
The prompt is written to stdin. Output is streamed as NDJSON events, with assistant text extracted from item.completed
events.
Model Mapping¶
| Tier | Codex Model |
|---|---|
large |
gpt-5.5 |
small |
codex-mini-latest |
Plan Mode¶
When SASE_AGENT_PLAN_MODE is set, Codex runs a two-phase plan/implement flow:
- Phase 1 (Planning): Runs with
--sandbox read-onlyand--ask-for-approval on-request. The model generates a plan captured via--output-last-message, on-disk plan files, or streamed response text. - Approval: The plan is presented for user approval with up to 5 feedback-retry rounds.
- Phase 2 (Implementation): On approval, runs with full permissions (
--dangerously-bypass-approvals-and-sandbox) using the plan content as the prompt.
Environment Variables¶
| Variable | Description |
|---|---|
SASE_LLM_LARGE_ARGS |
Extra CLI args for large tier (generic, preferred) |
SASE_LLM_SMALL_ARGS |
Extra CLI args for small tier (generic, preferred) |
SASE_CODEX_PATH |
Path to the Codex CLI binary (default: PATH, then NVM_BIN) |
SASE_CODEX_LARGE_ARGS |
Extra CLI args for large tier (Codex-specific fallback) |
SASE_CODEX_SMALL_ARGS |
Extra CLI args for small tier (Codex-specific fallback) |
SASE_CODEX_DISABLE_SHADOW_HOME |
Set to 1 to disable the disposable Codex home |
SASE_AGENT_PLAN_MODE |
Enable two-phase plan/implement flow |
The generic SASE_LLM_*_ARGS variables take precedence over SASE_CODEX_*_ARGS.
By default, SASE launches Codex with a per-invocation shadow CODEX_HOME under ~/.cache/sase/codex_home/. The shadow
home copies config.toml and symlinks other Codex home entries back to the real Codex home so Codex can read auth,
hooks, skills, logs, and caches while any config rewrites stay disposable. The shadow directory is removed after each
Codex subprocess exits. Set SASE_CODEX_DISABLE_SHADOW_HOME=1 to pass through the inherited environment directly for
debugging or emergency compatibility.
Timer Display¶
While waiting for a response, a gemini_timer("Waiting for Codex") spinner is shown (unless suppress_output is
True). In plan mode, the timer reads "Waiting for Codex (planning)" during Phase 1 and "Implementing plan" during
Phase 2.
Qwen Code Integration¶
The QwenProvider invokes the qwen CLI tool.
Command Construction¶
qwen --input-format text --output-format stream-json --yolo --model <model> [extra_args...]
The prompt is written to stdin using Qwen's text input mode. Output is streamed as JSON events; SASE extracts assistant
text from assistant events and falls back to the final result text when no assistant text is emitted.
Model Mapping¶
| Tier | Qwen Model |
|---|---|
large |
qwen3-coder-plus |
small |
qwen3-coder-flash |
Authentication¶
Configure Qwen Code through its supported auth and settings flow before using it from SASE. Qwen OAuth free tier access ended on 2026-04-15; use API keys, Alibaba Cloud Coding Plan, OpenRouter, Fireworks, or another Qwen-supported provider instead of relying on the discontinued OAuth free tier.
Environment Variables¶
| Variable | Description |
|---|---|
SASE_LLM_LARGE_ARGS |
Extra CLI args for large tier (generic, preferred) |
SASE_LLM_SMALL_ARGS |
Extra CLI args for small tier (generic, preferred) |
SASE_QWEN_PATH |
Path to the Qwen Code CLI binary (default: qwen) |
SASE_QWEN_LARGE_ARGS |
Extra CLI args for large tier (Qwen-specific fallback) |
SASE_QWEN_SMALL_ARGS |
Extra CLI args for small tier (Qwen-specific fallback) |
The generic SASE_LLM_*_ARGS variables take precedence over SASE_QWEN_*_ARGS.
Qwen Code config is left in Qwen's normal locations (~/.qwen/settings.json and project .qwen/settings.json). SASE
does not create a shadow Qwen home in the first implementation because local Qwen was unavailable during this phase, so
no normal headless-run config mutation could be verified.
Timer Display¶
While waiting for a response, a gemini_timer("Waiting for Qwen") spinner is shown (unless suppress_output is
True).
OpenCode Integration¶
The OpenCodeProvider invokes the opencode CLI tool.
Command Construction¶
opencode run --format json --dangerously-skip-permissions --model <provider/model> --dir <cwd> [extra_args...] <prompt>
The prompt is passed as OpenCode's run [message..] argument without shell interpolation. Output is streamed as JSONL
events; SASE extracts assistant text from text events, captures errors from error events, and accumulates token
counters from step_finish events when OpenCode reports them.
Model Mapping¶
OpenCode model IDs normally include an upstream provider prefix. Use %model:opencode/<provider/model> to route a
single SASE prompt to a concrete OpenCode model.
| Tier | OpenCode Model |
|---|---|
large |
anthropic/claude-sonnet-4-5 |
small |
openai/gpt-5-mini |
Authentication and Config¶
Configure OpenCode through its normal auth and settings flow before using it from SASE. OpenCode stores auth under its
XDG data directory and reads config from its XDG config directory plus project .opencode config. Use opencode models
to inspect the models available in your configured OpenCode environment.
SASE deploys OpenCode skills under ~/.config/opencode/skills/, which OpenCode scans as part of its config directory.
SASE does not create a shadow OpenCode data/config home in this first implementation because OpenCode's normal headless
run writes session/database state under its XDG data directory while reading auth/config from the standard locations.
Environment Variables¶
| Variable | Description |
|---|---|
SASE_LLM_LARGE_ARGS |
Extra CLI args for large tier (generic, preferred) |
SASE_LLM_SMALL_ARGS |
Extra CLI args for small tier (generic, preferred) |
SASE_OPENCODE_PATH |
Path to the OpenCode CLI binary (default: opencode) |
SASE_OPENCODE_LARGE_ARGS |
Extra CLI args for large tier (OpenCode-specific fallback) |
SASE_OPENCODE_SMALL_ARGS |
Extra CLI args for small tier (OpenCode-specific fallback) |
The generic SASE_LLM_*_ARGS variables take precedence over SASE_OPENCODE_*_ARGS.
Timer Display¶
While waiting for a response, a gemini_timer("Waiting for OpenCode") spinner is shown (unless suppress_output is
True).
External Provider Plugins¶
Additional LLM providers are shipped as external packages that declare [project.entry-points."sase_llm"] in their own
pyproject.toml. Plugins carry all their own metadata (model names, skill deploy path, CLI status color, auto-detect
priority, retry defaults) via pluggy @hookimpl methods — sase core has no plugin-specific branching.
External provider packages own their CLI invocation details, model metadata, skill deployment path, auto-detect
priority, and retry defaults. Install the provider package in the same environment as sase to make its sase_llm entry
point available.
Configuration¶
The LLM provider reads its configuration from ~/.config/sase/sase.yml under the llm_provider key.
Config File¶
llm_provider:
provider: claude # or "qwen", "opencode", "gemini" (default: auto-detect)
model_tier_map:
large: opus
small: sonnet
Config Fields¶
| Field | Type | Default | Description |
|---|---|---|---|
llm_provider.provider |
string | auto-detect | Which registered provider to use. Auto-detects by plugin-declared priority; built-ins default to claude → codex → qwen → opencode → gemini. |
llm_provider.model_tier_map.large |
string | - | Model identifier for the large tier |
llm_provider.model_tier_map.small |
string | - | Model identifier for the small tier |
Per-Prompt Provider Switching¶
The %model directive (see xprompt directives) can switch both the model and the LLM provider
for a single prompt. Provider resolution uses two strategies:
Explicit Provider/Model Syntax¶
Use provider/model to specify both explicitly:
%model:codex/o3
%model:claude/opus
%model:gemini/gemini-2.5-pro
%model:qwen/qwen3-coder-plus
%model:opencode/anthropic/claude-sonnet-4-5
Automatic Provider Resolution¶
Known model names are automatically mapped to their provider:
| Model Name | Provider |
|---|---|
opus, sonnet, haiku |
claude |
gpt-5.5, gpt-5.3-codex, codex-mini-latest, o3, o4-mini, gpt-5.4, gpt-4.1, gpt-4.1-mini, gpt-4o, gpt-4o-mini |
codex |
gemini-2.5-pro, gemini-2.5-flash, gemini-3.1-pro-preview, gemini-3-flash-preview, gemini-2.0-flash |
gemini |
qwen3-coder-plus, qwen3-coder-flash, qwen3-max, qwen-plus, qwen-max |
qwen |
anthropic/claude-sonnet-4-5, openai/gpt-5-mini, qwen/qwen3-coder-plus |
opencode |
Each installed plugin contributes its own model names via the llm_known_model_names() hook.
For unrecognized model names, the default provider is used.
Source: src/sase/llm_provider/registry.py
Model Tier System¶
The model tier system abstracts away specific model names. Callers request either "large" (most capable) or "small"
(faster/cheaper), and the provider maps the tier to a concrete model.
Type Definition¶
ModelTier = Literal["large", "small"]
Legacy Mapping¶
The old "big"/"little" terminology is still supported for backward compatibility:
| Old Value | New Tier | Display Label |
|---|---|---|
"big" |
"large" |
BIG |
"little" |
"small" |
LITTLE |
The model_size parameter on invoke_agent() is deprecated. Use model_tier instead.
Global Override¶
The model tier can be overridden globally via environment variable or CLI flag. The override forces ALL invocations to use the specified tier regardless of what the caller requests.
Resolution order:
SASE_MODEL_TIER_OVERRIDEenv var (accepts"large","small","big","little")SASE_MODEL_SIZE_OVERRIDEenv var (legacy, same values)--model-tier/--model-sizeCLI flag (sets the env var)- Caller's
model_tierparameter (default:"large")
Temporary Default Override¶
In addition to the tier-based global override, sase supports a concrete provider/model override that acts as a
temporary session-level default. This is the override the ACE ,P chord writes (see
docs/ace.md for the TUI flow).
The temporary override only changes the default provider/model selection for new agent launches. It does not affect:
- Already-running agents — they keep whatever provider/model they were launched with.
- Explicit
%modelprompt directives — they still take precedence. - An explicit
provider_name=argument toinvoke_agent()— it still wins. SASE_MODEL_TIER_OVERRIDE/SASE_MODEL_SIZE_OVERRIDE— those force a tier across all invocations regardless of this override; they layer on top, not under.
Resolution Order (default provider/model)¶
When no %model directive and no explicit provider_name are present, the default is resolved as:
- Active temporary override at
~/.sase/llm_override.json(if not expired). llm_provider.providerfrom the mergedsase.ymlconfig.- Auto-detection by plugin-declared priority (built-ins: claude, codex, qwen, opencode, then gemini).
A concrete temporary override sets both the default provider and a concrete model_override for the next launch — so
the agent metadata (running marker, plan review badge, agent rows) reflects the actual model that will run, not just the
configured default.
State File¶
{
"provider": "opencode",
"model": "anthropic/claude-sonnet-4-5",
"raw_model": "opencode/anthropic/claude-sonnet-4-5",
"created_at": 1777470000.0,
"expires_at": 1777473600.0,
"source": "ace"
}
| Field | Type | Description |
|---|---|---|
provider |
str |
Resolved provider name (e.g. "claude", "codex", "opencode"). |
model |
str |
Concrete model passed to the provider (e.g. "o3", "opus"). |
raw_model |
str |
Original user input (e.g. "codex/o3", "opencode/anthropic/..."). |
created_at |
float |
Unix timestamp when the override was set. |
expires_at |
float \| None |
Unix timestamp when the override expires; null means "until cleared". |
source |
str |
Free-form tag indicating who set the override (e.g. "ace"). |
Writes are atomic (temp file + os.replace). Reads are best-effort self-cleaning: an expired or unparseable file is
deleted on next access, so a forgotten override never lingers past its expires_at, even with no TUI running.
Model Resolution¶
The user-supplied raw_model is normalized through the same rules as %model:
provider/modelselects the provider explicitly (e.g.codex/o3oropencode/anthropic/claude-sonnet-4-5).- A bare known model name infers its provider from plugin metadata (e.g.
sonnet→ claude). - An unknown bare model is accepted and runs on the current default provider, matching
%modelbehavior.
Duration Parsing¶
Durations accept compact unit suffixes: 15m, 1h, 1h30m, 90m, 2h15m30s. Bare integers are interpreted as
minutes (45 → 45 minutes). The case-insensitive sentinel until cleared (or until_cleared) means "no expiry —
persists until the user clears it from the TUI or another sase process clears the state file."
Public API¶
The override primitives live in src/sase/llm_provider/temporary_override.py:
| Function | Purpose |
|---|---|
get_active_temporary_override(now=None) |
Read the active override (auto-deletes expired/malformed files). |
set_temporary_override(raw, dur, source=) |
Write a new override, replacing any existing one. |
clear_temporary_override() |
Remove the override file. Safe to call when nothing is active. |
parse_override_duration(value) |
Parse a user-facing duration string into seconds (or None). |
resolve_effective_default_provider_model() |
Centralized helper used by metadata pre-resolution paths. |
Examples¶
- ACE chord
,P, pickcodex/o3, duration1h→~/.sase/llm_override.jsonis written; new launches default to CODEX(o3) for the next hour. - ACE chord
,P, pickopencode/anthropic/claude-sonnet-4-5, duration1h→ new launches default to OPENCODE(anthropic/claude-sonnet-4-5). - ACE chord
,P, picksonnet, duration30m→ known bare model; provider resolves to claude via plugin metadata. - ACE chord
,P, choose Clear override →~/.sase/llm_override.jsonis removed; defaults revert to permanent config / autodetect.
Environment Variables¶
Complete reference of environment variables used by the LLM provider layer.
Generic (Provider-Agnostic)¶
| Variable | Description |
|---|---|
SASE_LLM_LARGE_ARGS |
Extra CLI args for large tier invocations |
SASE_LLM_SMALL_ARGS |
Extra CLI args for small tier invocations |
SASE_MODEL_TIER_OVERRIDE |
Force all invocations to a specific model tier |
SASE_MODEL_SIZE_OVERRIDE |
Legacy alias for SASE_MODEL_TIER_OVERRIDE |
Claude-Specific¶
| Variable | Description |
|---|---|
SASE_CLAUDE_LARGE_ARGS |
Claude-specific extra args for large tier |
SASE_CLAUDE_SMALL_ARGS |
Claude-specific extra args for small tier |
Codex-Specific¶
| Variable | Description |
|---|---|
SASE_CODEX_LARGE_ARGS |
Codex-specific extra args for large tier |
SASE_CODEX_SMALL_ARGS |
Codex-specific extra args for small tier |
SASE_AGENT_PLAN_MODE |
Enable Codex two-phase plan/implement flow |
Qwen-Specific¶
| Variable | Description |
|---|---|
SASE_QWEN_PATH |
Path to the Qwen Code CLI binary |
SASE_QWEN_LARGE_ARGS |
Qwen-specific extra args for large tier |
SASE_QWEN_SMALL_ARGS |
Qwen-specific extra args for small tier |
Gemini-Specific¶
| Variable | Description |
|---|---|
SASE_GEMINI_PATH |
Path to the Gemini CLI binary (default: "gemini"). |
External provider plugins document their own environment variables in their respective repos.
VCS Provider¶
| Variable | Description |
|---|---|
SASE_VCS_PROVIDER |
Override VCS provider ("git", "hg", or "auto") |
CLI Flags¶
ace¶
| Flag | Values | Description |
|---|---|---|
-m, --model-tier |
large, small |
Override model tier for all LLM invocations |
--model-size |
big, little |
Deprecated alias for --model-tier |
--vcs-provider |
git, hg, auto |
Override VCS provider |
axe¶
| Flag | Values | Description |
|---|---|---|
--vcs-provider |
git, hg, auto |
Override VCS provider |
The ace command wires --model-tier / --model-size into the model_tier_override parameter of AceApp. The
--vcs-provider flag is wired to the SASE_VCS_PROVIDER environment variable for downstream resolution.
Retry and Fallback¶
The LLM provider layer supports per-provider retry and fallback configuration. When an agent encounters a retryable error, it can automatically wait and retry, then optionally fall back to an alternate model.
Configuration¶
Retry behavior is configured per provider under llm_provider.retry in sase.yml:
llm_provider:
retry:
gemini:
max_retries: 3
error_patterns:
- "An unexpected critical error occurred:"
wait_times: [60, 300, 1800]
fallback_model: "gemini-3-flash-preview"
Config Fields¶
| Field | Type | Default | Description |
|---|---|---|---|
max_retries |
int | 0 |
Maximum retry attempts. 0 disables retrying. |
error_patterns |
list[str] | [] |
Case-insensitive substring patterns matched against error output. |
wait_times |
list[int] | [30] |
Per-retry wait times in seconds. Last value reused if list is too short. |
fallback_model |
str|null | null |
Alternate model to use after exhausting all retries. |
continuation_prompt |
str | "" |
Text prepended to state.current_prompt on every retry (used to nudge the agent). |
spawn_new_agent |
bool | false |
Opt in to spawn-on-retry: a retryable error spawns a fresh detached child agent (as if sase run -d had been invoked) instead of in-process retry. See Spawn-on-Retry below. |
Default Configuration¶
Gemini and Claude have retry defaults (defined in default_config.yml); external provider plugins may declare their own
via the llm_default_retry_config() hook.
Gemini:
- max_retries: 3
- error_patterns:
["An unexpected critical error occurred:"] - wait_times:
[60, 300, 1800](1 min, 5 min, 30 min) - fallback_model:
"gemini-3-flash-preview"
Claude:
- max_retries: 3
- error_patterns:
["API Error: 500", "API Error: 529", "Internal server error", "overloaded_error"] - wait_times:
[60, 300, 1800](1 min, 5 min, 30 min) - fallback_model:
"sonnet"
Built-In "Prompt is too long" Recovery (Claude)¶
Claude has an additional built-in retry entry registered internally (not in default_config.yml) that auto-recovers
agents from context-overflow errors without any user config:
- error pattern:
"Prompt is too long" - max_retries: 3
- wait_times:
[0]— zero-delay retry so a fresh session restarts immediately - continuation_prompt: A short nudge that tells the coder to inspect
git status/git diffbefore resuming, since prior edits are preserved on disk when the retry wipes only the in-memory context
User-supplied llm_provider.retry.claude config is merged on top of these built-ins: explicit falsy values
(max_retries: 0 to opt out entirely, continuation_prompt: "" to disable the nudge) override the built-in via
key-presence checks. error_patterns is a de-duplicated union of built-in and user lists.
On every retry attempt the continuation_prompt (if non-empty) is idempotently prepended to state.current_prompt
before the next invocation — the prepend is gated on a startswith check so repeated retries don't stack duplicate
nudges. Workspaces are preserved across built-in context-overflow retries (no workspace wipe), so on-disk edits remain
available to the restarted session.
Retry Flow¶
Error detected
│
├── Does error match error_patterns? (case-insensitive substring)
│ ├── No → fail immediately
│ └── Yes → retry_count < max_retries?
│ ├── Yes → wait (wait_times[retry_count]) → retry
│ └── No → fallback_model configured?
│ ├── Yes → switch model via SASE_MODEL_OVERRIDE → retry once
│ └── No → fail
Wait periods are interruptible — if the agent is killed during a wait, it stops immediately.
TUI Display¶
The ACE Agents tab reflects retry state (see Retry/Fallback Display):
- RETRYING (Ns) — Waiting before the next attempt (bold orange, with countdown)
- ↻N — Retry count annotation on running agents
- ▸Model — Fallback model annotation (e.g.,
↻3▸flash)
Metadata Tracking¶
After execution completes, retry metadata is written to done.json in the agent's artifacts directory:
{
"retry_count": 2,
"retry_errors": ["An unexpected critical error occurred: ..."],
"used_fallback": false
}
Source: src/sase/llm_provider/retry_config.py, src/sase/axe/run_agent_exec.py
Spawn-on-Retry¶
When ProviderRetryConfig.spawn_new_agent=True, a retryable error spawns a fresh detached child agent (as if
sase run -d had been invoked) instead of running the next attempt in-process. The failing parent transfers its
workspace claim to the child via transfer_workspace_claim() and exits with status FAILED (RETRIED). This trades the
small cost of a fresh process for two benefits:
- The workspace is preserved by design — the child skips
prepare_workspace()and inherits the parent's in-progress edits via the transferred workspace claim. (Legacy in-process retry runsprepare_workspace()between attempts and wipes uncommitted file edits unlesspreserve_workspace=True.) - A retry boundary becomes a real process boundary, which is more robust against memory leaks, lingering child processes, and stale interpreter state.
Linkage fields (written to both agent_meta.json and done.json so retry chains are queryable from either side):
| Field | Meaning |
|---|---|
retry_of_timestamp |
Backward link: the parent agent's run timestamp. |
retried_as_timestamp |
Forward link: the child agent's run timestamp (written on the parent at handoff). |
retry_chain_root_timestamp |
The root agent's timestamp — stable across the entire chain. |
retry_attempt |
Depth in the chain (1-based). |
State is carried across the boundary by a retry_handoff.json file written to the parent's artifacts directory; the
child reads it before launch.
Fallback behavior: spawn-on-retry is opt-in (default false). If spawning fails (e.g. workspace transfer fails),
the legacy in-process retry runs as a fallback so the user is never worse off.
Source: src/sase/axe/run_agent_retry_spawn.py, src/sase/llm_provider/retry_config.py
Token Usage Tracking¶
The LLM provider layer tracks token usage for Claude Code agent runs. Input tokens, output tokens, and cache-read tokens
are extracted from the Claude Code stream-json result events and persisted as a usage.json artifact in the agent run
directory.
Artifact Format¶
{
"input_tokens": 12345,
"output_tokens": 6789,
"cache_read_tokens": 3456
}
When telemetry is enabled, token counts are also recorded as Prometheus counters (sase_llm_input_tokens_total,
sase_llm_output_tokens_total, sase_llm_cache_read_tokens_total) for monitoring and dashboards. See
docs/telemetry.md for the full telemetry reference.
Source: src/sase/llm_provider/_subprocess.py, src/sase/llm_provider/types.py
Prompt Preprocessing Pipeline¶
Before any prompt reaches a provider, it passes through a 6-step preprocessing pipeline defined in preprocessing.py.
Steps¶
| # | Step | Syntax | Description |
|---|---|---|---|
| 1 | xprompt references | #name |
Expand reusable inline prompt snippets from xprompts |
| 2 | Command substitution | $(cmd) |
Execute shell commands and inline their output |
| 3 | File references | @path |
Inline file contents (copy absolute/tilde paths) |
| 4 | Jinja2 rendering | {{ var }} |
Render Jinja2 templates after all prior expansions |
| 5 | Prettier formatting | - | Format with prettier for consistent markdown |
| 6 | Comment stripping | <!-- ... --> |
Remove HTML/markdown comments |
Order Matters¶
The pipeline runs in strict order. Jinja2 rendering (step 4) happens after xprompt, command substitution, and file reference expansion, so templates can reference content injected by earlier steps.
Home Mode¶
When is_home_mode=True, file reference processing skips copying files (step 3). This is used when the invocation
doesn't need side effects from @path references.
Source Functions¶
The preprocessing steps delegate to functions from two libraries:
xprompt:process_xprompt_references(),is_jinja2_template(),render_toplevel_jinja2()gemini_wrapper.file_references:process_command_substitution(),process_file_references(),format_with_prettier(),strip_html_comments()
Subprocess Streaming¶
Providers use shared helpers in _subprocess.py to stream LLM output in real-time.
Mechanism¶
- The provider spawns the CLI tool via
subprocess.Popenwithstdout=PIPEandstderr=PIPE; providers that consume prompts from stdin also setstdin=PIPE. - The prompt is supplied using the provider's documented transport, either stdin or an argv message argument.
- Both stdout and stderr file descriptors are set to non-blocking mode via
os.set_blocking(). - A
select.select()loop with a 0.1s timeout polls for readable data on both streams. - Lines are read and optionally printed to the console in real-time.
- After the process exits (
process.poll() is not None), any remaining buffered output is drained. - The function returns
(stdout_content, stderr_content, return_code).
Live Reply File¶
When SASE_ARTIFACTS_DIR is set, the streaming output is also written in real-time to
<SASE_ARTIFACTS_DIR>/live_reply.md. This file is used by the ACE TUI Agents tab to display the agent's reply as it
streams in, and remains available after execution completes for the metadata panel's AGENT REPLY section.
Output Suppression¶
When suppress_output=True, lines are still captured but not printed to the console. This is used for background
invocations where the caller only needs the final result.
Postprocessing¶
After a provider returns (or raises an error), the orchestration layer runs postprocessing steps.
On Success (postprocess_success)¶
- Audio notification: Plays a sound via
run_bam_command("Agent reply received")(skipped ifsuppress_output). - Log to sase.md: Appends a timestamped entry with the prompt and response to
<artifacts_dir>/sase.md(ifartifacts_diris set). - Save chat history: Writes to
~/.sase/chats/ifworkflowis set. See Chat History.
On Error (postprocess_error)¶
- Rich error display: Prints the prompt and error via
print_prompt_and_response()with an_ERRORsuffix on the agent type label (skipped ifsuppress_output). - Log to sase.md: Same as success, but the response is the error message and the agent type gets an
_ERRORsuffix. - Save error chat history: Writes to
~/.sase/chats/with an_ERRORagent suffix.
sase.md Log Format¶
Each entry in the log file follows this format:
## <timestamp> - <agent_type> - iteration <N> - tag <workflow_tag>
### PROMPT:
\`\`\` <prompt text> \`\`\`
### RESPONSE:
\`\`\` <response text> \`\`\`
---
Prompt File Saving¶
Before invocation, the preprocessed prompt is saved to <artifacts_dir>/<agent_type>_prompt.md (or
<agent_type>_iter_<N>_prompt.md if an iteration number is set). This allows reviewing the exact prompt that was sent.
Chat History¶
Chat histories are stored as markdown files in ~/.sase/chats/.
File Naming¶
<branch_or_workspace>-<workflow>-[<agent>-]<timestamp>.md
| Part | Source | Example |
|---|---|---|
branch_or_workspace |
Output of branch_or_workspace_name |
my_feature |
workflow |
Workflow name, normalized | crs, run |
agent |
Agent type (omitted if same as workflow) | editor, planner |
timestamp |
YYmmdd_HHMMSS format |
260214_153042 |
Dashes and slashes in workflow names are normalized to underscores.
File Format¶
# Chat History - <workflow> (<agent>)
**Timestamp:** <display_timestamp>
## Previous Conversation
<previous history if resuming>
---
## Prompt
<prompt text>
## Response
<response text>
Resume Support¶
The sase run --resume flag resumes a previous conversation by agent name. The #resume workflow resolves the agent
name to its artifacts directory, extracts the response path from done.json, and delegates to #resume_by_chat which
loads the chat history and prepends it to the new conversation. The --resume flag also accepts a history file basename
or full path for direct chat-file-based resumption via the #resume_by_chat workflow.
Resume expansion is recursive: if the loaded chat history itself contains #resume or #resume_by_chat references,
those are expanded inline as well. Cycle detection prevents infinite loops when chat histories reference each other.
Invocation Lifecycle¶
The invoke_agent() function in _invoke.py orchestrates the complete lifecycle of an LLM invocation. Here is the
end-to-end flow:
invoke_agent(prompt, agent_type, model_tier, ...)
│
├── 1. Handle deprecated model_size → model_tier mapping
├── 2. Check SASE_MODEL_TIER_OVERRIDE / SASE_MODEL_SIZE_OVERRIDE env vars
├── 3. Build LoggingContext from parameters
│
├── 4. Preprocess prompt (6-step pipeline)
│ ├── xprompt references (#name)
│ ├── Command substitution ($(cmd))
│ ├── File references (@path)
│ ├── Jinja2 rendering ({{ var }})
│ ├── Prettier formatting
│ └── Comment stripping
│
├── 5. Display decision counts (if not suppressed)
├── 6. Print prompt via Rich (if not suppressed)
├── 7. Generate or use provided timestamp
├── 8. Save prompt to artifacts directory
│
├── 9. Get provider from registry and invoke
│ ├── Build CLI command with flags
│ ├── Spawn subprocess (Popen)
│ ├── Supply prompt via provider transport
│ └── Stream stdout/stderr in real-time
│
├── 10. Postprocess
│ ├── Success path:
│ │ ├── Audio notification
│ │ ├── Log to sase.md
│ │ └── Save chat history
│ └── Error path:
│ ├── Rich error display
│ ├── Log error to sase.md
│ └── Save error chat history
│
└── 11. Return AIMessage(content=response)
Parameters¶
| Parameter | Type | Default | Description |
|---|---|---|---|
prompt |
str |
(required) | Raw prompt to send |
agent_type |
str |
(required) | Agent type label (e.g., "editor") |
model_tier |
ModelTier |
"large" |
Model tier to use |
model_size |
"big" \| "little" \| None |
None |
Deprecated, use model_tier |
iteration |
int \| None |
None |
Iteration number for logging |
workflow_tag |
str \| None |
None |
Workflow tag for logging |
artifacts_dir |
str \| None |
None |
Directory for sase.md and prompt files |
workflow |
str \| None |
None |
Workflow name for chat history |
suppress_output |
bool |
False |
Suppress console output |
timestamp |
str \| None |
None |
Shared timestamp (YYmmdd_HHMMSS) |
is_home_mode |
bool |
False |
Skip file copying for @ references |
decision_counts |
dict[str, Any] \| None |
None |
Planning agent decision counts |
provider_name |
str \| None |
None |
Override provider (default from config) |
Return Value¶
Always returns an AIMessage (from langchain_core.messages). On error, the content field contains the error message
rather than a response.