LLM Provider Integration¶
This document describes the LLM provider abstraction layer in sase. The system supports pluggable LLM backends (Claude
Code, Codex, Antigravity CLI (agy), Qwen Code, and OpenCode are bundled; additional providers can ship as external
plugins) behind a shared orchestration layer that handles preprocessing, invocation, and postprocessing.
Table of Contents¶
- Overview
- Provider Architecture
- Commit Finalization
- Claude Code Integration
- Antigravity (
agy) Integration - Codex CLI Integration
- Qwen Code Integration
- OpenCode Integration
- External Provider Plugins
- Configuration
- Per-Prompt Provider Switching
- Model Tier System
- Worker Model
- Temporary Default Override
- Environment Variables
- CLI Flags
- Retry and Fallback
- Token Usage Tracking
- Prompt Preprocessing Pipeline
- Subprocess Streaming
- Postprocessing
- Chat History
- Invocation Lifecycle
Overview¶
The LLM provider layer decouples prompt handling from the underlying LLM backend. All providers share a common preprocessing pipeline, subprocess streaming mechanism, and postprocessing workflow. The actual LLM invocation is delegated to a pluggable provider selected at runtime.
Key design principles:
- Providers are thin: They only construct CLI commands and run subprocesses. All preprocessing and postprocessing lives in the shared orchestration layer.
- Registry-based selection: Providers register themselves by name and are resolved via config or explicit override.
- Tier-based model selection: Callers request a "large" or "small" tier; the provider maps it to a concrete model.
- Runtime-uniform commit enforcement: SASE agent sessions use a shared commit finalizer instead of provider-specific native stop hooks.
Source Layout¶
| File | Purpose |
|---|---|
src/sase/llm_provider/__init__.py |
Public API exports |
src/sase/llm_provider/base.py |
LLMProvider abstract base class |
src/sase/llm_provider/_hookspec.py |
Pluggy hook specifications (LLMHookSpec) |
src/sase/llm_provider/_plugin_manager.py |
Plugin manager wrapping pluggy (LLMPluginManager) |
src/sase/llm_provider/claude.py |
Claude Code provider implementation |
src/sase/llm_provider/codex.py |
Codex CLI provider implementation |
src/sase/llm_provider/agy.py |
Antigravity CLI (agy) provider implementation |
src/sase/llm_provider/qwen.py |
Qwen Code provider implementation |
src/sase/llm_provider/opencode.py |
OpenCode provider implementation |
src/sase/llm_provider/registry.py |
Provider registration and lookup |
src/sase/llm_provider/config.py |
Config file reader (sase.yml) |
src/sase/llm_provider/temporary_override.py |
Primary/worker temporary override state and resolution |
src/sase/llm_provider/commit_finalizer.py |
Provider-neutral dirty-workspace finalizer |
src/sase/llm_provider/types.py |
ModelTier, InvokeResult, LoggingContext types |
src/sase/llm_provider/_invoke.py |
invoke_agent() orchestrator |
src/sase/llm_provider/_subprocess.py |
Provider stream-parser compatibility exports |
src/sase/llm_provider/_plan_utils.py |
Shared plan utilities |
src/sase/llm_provider/preprocessing.py |
Shared prompt preprocessing pipeline |
src/sase/llm_provider/postprocessing.py |
Logging, chat history, audio |
src/sase/llm_provider/retry_config.py |
ProviderRetryConfig (per-provider retry defaults) |
Provider Architecture¶
Base Class¶
All providers implement the LLMProvider abstract base class:
class LLMProvider(ABC):
@abstractmethod
def invoke(
self,
prompt: str,
*,
model_tier: ModelTier,
suppress_output: bool = False,
model_override: str | None = None,
) -> InvokeResult: ...
| Parameter | Type | Description |
|---|---|---|
prompt |
str |
Already-preprocessed prompt text |
model_tier |
ModelTier |
"large" or "small" |
suppress_output |
bool |
If True, suppress real-time console output |
model_override |
str \| None |
Concrete model name from %model, a temporary override, or retry |
Returns InvokeResult(content=..., usage=...). Providers raise subprocess.CalledProcessError for failed CLI exits or
a provider-specific exception for launch/configuration failures.
Registry¶
Providers are discovered via importlib.metadata.entry_points(group="sase_llm"). The built-in providers are packaged
the same way as external provider plugins; their entry points live in pyproject.toml:
[project.entry-points."sase_llm"]
claude = "sase.llm_provider.claude:ClaudeCodeProvider"
codex = "sase.llm_provider.codex:CodexProvider"
agy = "sase.llm_provider.agy:AgyProvider"
opencode = "sase.llm_provider.opencode:OpenCodeProvider"
qwen = "sase.llm_provider.qwen:QwenProvider"
External plugin packages declare additional entries under the same group.
To get a provider instance:
provider = get_provider() # Uses default from config
provider = get_provider("claude") # Explicit provider name
Selection Logic¶
- If
provider_nameis passed toinvoke_agent(), use that. - If the prompt has a
%modeldirective, resolve explicitprovider/modelsyntax first, then known model names from installed plugin metadata. - If no explicit provider/model was supplied, use an active temporary override from
~/.sase/llm_override.json. - Otherwise, read the
llm_provider.providerfield from~/.config/sase/sase.yml. - If no config exists (or provider is empty), auto-detect by walking registered plugins in ascending
llm_autodetect_priority()order and picking the first whosellm_autodetect_cli_name()is onPATH. Built-in priorities:claude=0,codex=10,qwen=15,opencode=18,agy=30. External plugins slot in by declaring their own priority.agyautodetects via theagyCLI name in the late-fallback slot.
Commit Finalization¶
After a provider returns successfully, invoke_agent() runs the provider-neutral commit finalizer before success
postprocessing when the process is a SASE agent session (SASE_AGENT_TIMESTAMP is set). The finalizer checks the active
project workspace through the active VCS provider and checks configured linked repositories as Git worktrees at their
resolved workspace_dir. If it finds dirty enforced work, it sends the same provider a bounded follow-up prompt that
lists the dirty files and instructs the agent to use the appropriate commit skill, such as /sase_git_commit. Dirty
static linked repos (workspace.strategy: none) are included in that prompt only as advisory work and do not fail the
run if they remain dirty. A narrow generated SDD plan closeout, where the only enforced change is one markdown file's
frontmatter status: wip becoming status: done, is committed directly with a TYPE=sdd commit instead of consuming a
provider follow-up pass.
The finalizer skips when the call is outside a SASE agent session, when commit.finalizer.enabled is false, or when
SASE_DISABLE_COMMIT_STOP_HOOK=1 is set. When an artifacts directory is available, each follow-up pass writes
commit_finalizer_pass_<N>_prompt.md and commit_finalizer_pass_<N>_response.md; the final outcome is written to
commit_finalizer_result.json. If the workspace remains dirty after commit.finalizer.max_passes, the invocation is
converted into an LLMInvocationError rather than being logged as a successful clean run.
The older provider-native commit hook scripts are no longer shipped; SASE-launched agent sessions rely on the shared finalizer path.
Claude Code Integration¶
The ClaudeCodeProvider invokes the claude CLI tool.
Command Construction¶
claude -p --verbose --model <alias> --output-format stream-json --dangerously-skip-permissions --session-id <uuid> [extra_args...]
The prompt is written to stdin. Output is streamed as JSON events; SASE extracts assistant text and token usage from the stream.
Model Mapping¶
| Tier | Claude CLI Alias |
|---|---|
large |
opus |
small |
sonnet |
Environment Variables¶
| Variable | Description |
|---|---|
SASE_LLM_LARGE_ARGS |
Extra CLI args for large tier (generic, preferred) |
SASE_LLM_SMALL_ARGS |
Extra CLI args for small tier (generic, preferred) |
SASE_CLAUDE_LARGE_ARGS |
Extra CLI args for large tier (Claude-specific fallback) |
SASE_CLAUDE_SMALL_ARGS |
Extra CLI args for small tier (Claude-specific fallback) |
The generic SASE_LLM_*_ARGS variables take precedence. Values are split on whitespace and appended to the command.
Timer Display¶
While waiting for a response, a provider_timer("Waiting for Claude") spinner is shown (unless suppress_output is
True).
Claude Tool-Call Hooks¶
To record what tools an agent actually invoked (file reads, edits, bash commands, etc.), ClaudeCodeProvider.invoke()
asks Claude Code to call back into SASE every time a tool runs. It does this by writing a pair of PreToolUse and
PostToolUse hook entries into the workspace's .claude/settings.local.json for the duration of the agent run. Each
entry matches all tools ("matcher": "*") and invokes the sase_claude_tool_hook console script, which reads the
Claude-supplied JSON payload from stdin and appends one normalized record (schema version 3) to
$SASE_ARTIFACTS_DIR/tool_calls.jsonl:
- The
PreToolUsehook writes a pending entry capturing the tool name and a redacted version of its input. - The
PostToolUsehook writes the matching result entry: success/failure/interrupted status, the call's duration, and a length-bounded preview of the response.
The ACE Tools panel reads this same tool_calls.jsonl to render the per-agent timeline — see
Agents Tab Tools Panel.
Installation and cleanup are wrapped in a claude_hooks_session() context manager that is careful not to corrupt
user-managed Claude settings:
- Writes to
.claude/settings.local.jsongo throughtmp + os.replaceso a killed agent cannot leave a half-written file behind. - Each SASE-installed hook command carries a
_sase_managedsentinel value. On exit, cleanup removes only entries carrying that sentinel; any pre-existing user or project hooks (including hooks for unrelated events such asNotification) are left untouched. - "Home-mode" launches — agents started outside a tracked workspace, identified by the absence of all three of
SASE_GIT_WORKSPACE_DIR,SASE_CD_WORKSPACE_DIR, andSASE_ACTIVE_PROJECT_DIR— skip the settings mutation entirely. They emit aclaude_hooks_skippeddiagnostic totool_calls_writer_errors.jsonlso the operator can see why the hook records are missing, and rely on the stream-derived fallback writer (below) to populate the timeline. - If
.claude/settings.local.jsonexists but is malformed JSON, it is left alone, the run logs a diagnostic, and the fallback writer takes over. - If SASE created the file (it did not pre-exist) and only SASE entries remain at exit, both the file and an empty
.claudedirectory are removed so the workspace is left clean.
The collector script itself is intentionally non-blocking: malformed JSON, non-object payloads, exceptions inside the
collector, a missing SASE_ARTIFACTS_DIR, and unrecognized hook event names all produce a best-effort diagnostic (or a
silent no-op when stdin is empty) and exit 0. This guarantees that a SASE-side bug can never make Claude surface the
hook as a tool-call failure to the agent.
The hook-based writer coexists with a stream-derived fallback writer in the LLM provider layer, which parses tool calls
out of the Claude streaming response. Both writers append to the same artifact, and the Tools-panel reader accepts
schema versions 1, 2, and 3. When hook and stream records describe the same tool_use_id, the reader keeps the
hook-derived record and suppresses the duplicate stream-derived row; otherwise, older stream-only artifacts remain
readable.
The normalized tool-call artifact is still Python/TUI-owned glue rather than a shared sase-core contract. Move it into
../sase-core only if another frontend or integration needs to produce or consume exactly the same schema through the
Rust boundary.
Source: src/sase/llm_provider/claude.py, src/sase/llm_provider/_claude_hooks.py,
src/sase/llm_provider/_tool_calls.py, src/sase/scripts/sase_claude_tool_hook.py, src/sase/ace/tui/tools/reader.py
Antigravity (agy) Integration¶
The AgyProvider invokes Google's Antigravity CLI (agy), the replacement for the retired consumer Gemini CLI. It is a
plain-stdout provider: Antigravity CLI 1.0.10 does not document a machine-readable JSON/stream output mode, so SASE
streams plain stdout instead of parsing a structured event stream.
Command Construction¶
agy --print-timeout <duration> --model <model> --dangerously-skip-permissions --add-dir <workspace> --print <prompt>
The prompt is passed as the value of --print (not on stdin) as a single argv element, so prompts containing quotes,
newlines, or shell metacharacters are never shell-interpolated. --print-timeout defaults to 24h (Antigravity's own
5m default is too short for long agentic runs) and is a Go duration string.
SASE pins Antigravity to the agent workspace in two ways: it launches the subprocess with cwd=<workspace> and passes
--add-dir <workspace> to the CLI. The workspace is resolved from SASE_ACTIVE_PROJECT_DIR, then provider project and
workspace env vars, and finally the current working directory.
Because the current Antigravity CLI does not document a stable stdin or prompt-file contract for print mode, SASE cannot
fall back to streaming the prompt when that single argv element becomes too large for the OS. AgyProvider therefore
rejects prompts above a conservative 120 KiB UTF-8 guard before spawning agy, with an error that names the upstream
argv transport limitation and asks the user to reduce the prompt or use a stdin-capable provider.
Before invoking agy --print, SASE wraps the user prompt with a compact print-mode directive. It tells the model that
tool approval has already been granted by --dangerously-skip-permissions, commands must run synchronously, background
tasks should not be used because print mode has no event loop for later notifications, and the final answer must be
written directly to stdout.
Print-Mode No-Progress Recovery¶
Antigravity's run_command tool can dispatch long-running commands as background tasks. In an interactive Antigravity
session, the UI can deliver the later completion notification and the model can continue. In agy --print, SASE starts
a single non-interactive process and reads stdout; there is no follow-up event loop. Some models therefore end the print
turn with prose such as "I will wait to be notified" or "please approve the command" even though the subprocess exits
0.
AgyProvider treats those replies as no-progress, not success. When the supported trajectory extractor is available,
SASE first checks the structural diff: zero tool-use steps or a final pending/backgrounded run_command step triggers
recovery. When trajectory data is unavailable, a conservative text heuristic catches planning-only/waiting replies. SASE
then restarts agy --print with accumulated context and a provider-local continuation nudge that asks the model to run
tools synchronously and output the final answer. If the reply still makes no progress after the bounded continuation
budget, invoke() raises LLMInvocationError so the run fails loudly instead of writing a false-success answer.
Model Mapping¶
agy model display names are used verbatim — they contain spaces and parentheses (e.g. Gemini 3.5 Flash (High)). The
tier defaults are:
| Tier | Model | Short alias |
|---|---|---|
large |
Gemini 3.5 Flash (High) |
flash35h |
small |
Gemini 3.5 Flash (Low) |
flash35l |
All other agy models names remain reachable through %model:agy/<exact name>, the model picker, and configured
aliases.
Environment Variables¶
| Variable | Description |
|---|---|
SASE_AGY_PATH |
Path to the Antigravity CLI binary (default: "agy"). |
SASE_AGY_PRINT_TIMEOUT |
Override the agy --print-timeout Go duration (default: "24h"). |
SASE_AGY_MAX_NO_PROGRESS_CONTINUATIONS |
Override the no-progress continuation cap (default: 2). |
SASE_AGY_LARGE_ARGS |
Extra args for the large tier (after SASE_LLM_LARGE_ARGS). |
SASE_AGY_SMALL_ARGS |
Extra args for the small tier (after SASE_LLM_SMALL_ARGS). |
Skill Deployment¶
sase skill init -p agy writes generated SASE skills to ~/.gemini/antigravity-cli/skills/, the documented Antigravity
global skill path. The leading .gemini here is an Antigravity-owned path, not a Gemini CLI path.
Structured Artifacts Parity Gap¶
Antigravity CLI 1.0.10 exposes no stable machine-readable stdout contract: there is no documented
--output-format stream-json or JSON event mode. Because SASE will not scrape Antigravity's human TUI rendering to
synthesize artifacts, the agy provider preserves these invariants:
- Tool-call timeline — SASE never invents rows from stdout display glyphs or prose. For explicitly supported
Antigravity versions, a guarded best-effort extractor may decode new rows from Antigravity's local trajectory DB and
append
source="trajectory"records totool_calls.jsonl; otherwise the ACE Agents Tab Tools Panel shows nothing foragyruns. - Usage accounting —
InvokeResult.usageisNoneand nousage.jsonis written;agyprint mode exposes no stable token counters. - Thinking extraction — no thinking artifact is produced.
The plain-stdout path still writes live_reply.md (and live_reply_timestamps.jsonl) like every other provider, so the
final reply, chat history, and resume support work normally. These structured features are fast-follow work gated on a
future Antigravity machine-readable output/log/conversation contract.
Timer Display¶
While waiting for a response, a Waiting for Antigravity spinner is shown (unless suppress_output is True).
Codex CLI Integration¶
The CodexProvider invokes the OpenAI codex CLI tool.
Command Construction¶
Normal mode:
codex exec --model <model> --dangerously-bypass-approvals-and-sandbox --json --color never --skip-git-repo-check - [extra_args...]
The prompt is written to stdin. Output is streamed as NDJSON events, with assistant text extracted from item.completed
events.
Model Mapping¶
| Tier | Codex Model |
|---|---|
large |
gpt-5.5 |
small |
codex-mini-latest |
Plan Handling¶
The Codex provider does not enable Codex CLI's native plan mode. SASE planning flows are implemented at the
orchestration layer through workflows, xprompts, and the sase_plan skill, so provider behavior stays consistent across
runtimes.
Environment Variables¶
| Variable | Description |
|---|---|
SASE_LLM_LARGE_ARGS |
Extra CLI args for large tier (generic, preferred) |
SASE_LLM_SMALL_ARGS |
Extra CLI args for small tier (generic, preferred) |
SASE_CODEX_PATH |
Path to the Codex CLI binary (default: PATH, then NVM_BIN) |
SASE_CODEX_LARGE_ARGS |
Extra CLI args for large tier (Codex-specific fallback) |
SASE_CODEX_SMALL_ARGS |
Extra CLI args for small tier (Codex-specific fallback) |
SASE_CODEX_DISABLE_SHADOW_HOME |
Set to 1 to disable the disposable Codex home |
The generic SASE_LLM_*_ARGS variables take precedence over SASE_CODEX_*_ARGS.
By default, SASE launches Codex with a per-invocation shadow CODEX_HOME under ~/.cache/sase/codex_home/. The shadow
home copies config.toml and symlinks other Codex home entries back to the real Codex home so Codex can read auth,
hooks, skills, logs, and caches while any config rewrites stay disposable. The shadow directory is removed after each
Codex subprocess exits. Set SASE_CODEX_DISABLE_SHADOW_HOME=1 to pass through the inherited environment directly for
debugging or emergency compatibility.
Codex Tool-Call Capture¶
SASE captures Codex tool calls from the codex exec --json NDJSON stream; it does not install Codex hooks or mutate
user Codex configuration for telemetry. When SASE_ARTIFACTS_DIR is present, the stream parser appends normalized Codex
records to $SASE_ARTIFACTS_DIR/tool_calls.jsonl for the ACE Agents Tab Tools Panel.
Current fixture coverage is based on Codex CLI 0.130.0. For stream items that expose both start and completion events
(command_execution, file_change, and named tool items), SASE writes ToolUse and ToolResult rows with
runtime: "codex" and source: "stream". The Tools-panel reader collapses those pairs into one row, preserving pending
rows while a command is still running and showing result previews, failure/interruption status, and duration when the
stream exposes enough data to compute it.
Older Codex stream shapes that only expose a completed function_call item remain readable as legacy FunctionCall
rows. Those records can show the tool name and compact input target, but they do not invent response summaries,
durations, or failure details that Codex did not emit.
Codex tool-call summaries use the same bounded and redacted artifact helpers as the other providers. Set
SASE_TOOL_LOG_FULL=1 only for explicit debugging sessions when raw tool input or output is needed in the local
artifact.
Timer Display¶
While waiting for a response, a provider_timer("Waiting for Codex") spinner is shown (unless suppress_output is
True).
Qwen Code Integration¶
The QwenProvider invokes the qwen CLI tool.
Command Construction¶
qwen --input-format text --output-format stream-json --yolo --model <model> [extra_args...]
The prompt is written to stdin using Qwen's text input mode. Output is streamed as JSON events; SASE extracts assistant
text from assistant events and falls back to the final result text when no assistant text is emitted.
Model Mapping¶
| Tier | Qwen Model |
|---|---|
large |
qwen3.6-plus |
small |
qwen3-coder-flash |
Authentication¶
Configure Qwen Code through its supported auth and settings flow before using it from SASE. Qwen OAuth free tier access ended on 2026-04-15; use API keys, Alibaba Cloud Coding Plan, OpenRouter, Fireworks, or another Qwen-supported provider instead of relying on the discontinued OAuth free tier.
Environment Variables¶
| Variable | Description |
|---|---|
SASE_LLM_LARGE_ARGS |
Extra CLI args for large tier (generic, preferred) |
SASE_LLM_SMALL_ARGS |
Extra CLI args for small tier (generic, preferred) |
SASE_QWEN_PATH |
Path to the Qwen Code CLI binary (default: qwen) |
SASE_QWEN_LARGE_ARGS |
Extra CLI args for large tier (Qwen-specific fallback) |
SASE_QWEN_SMALL_ARGS |
Extra CLI args for small tier (Qwen-specific fallback) |
The generic SASE_LLM_*_ARGS variables take precedence over SASE_QWEN_*_ARGS.
Qwen Code config is left in Qwen's normal locations (~/.qwen/settings.json and project .qwen/settings.json). SASE
does not create a shadow Qwen home in the first implementation because local Qwen was unavailable during this phase, so
no normal headless-run config mutation could be verified.
Qwen Tool-Call Capture¶
SASE captures Qwen tool calls from the qwen --output-format stream-json event stream; it does not install Qwen hooks.
When SASE_ARTIFACTS_DIR is present, the stream parser normalizes Qwen's nested tool_use and tool_result blocks
into records appended to $SASE_ARTIFACTS_DIR/tool_calls.jsonl for the ACE
Agents Tab Tools Panel with runtime: "qwen" and source: "stream". Malformed or
unsupported tool-shaped events emit a diagnostic instead of producing a malformed record. The Tools-panel reader
collapses each start/result pair into a single row.
Commit Finalization¶
SASE-launched Qwen runs use the shared provider-neutral commit finalizer described above; active SASE settings do not need repo-local or global Qwen commit-hook configuration.
Timer Display¶
While waiting for a response, a provider_timer("Waiting for Qwen") spinner is shown (unless suppress_output is
True).
OpenCode Integration¶
The OpenCodeProvider invokes the opencode CLI tool.
Command Construction¶
opencode run --format json --dangerously-skip-permissions --model <provider/model> --dir <cwd> [extra_args...] <prompt>
The prompt is passed as OpenCode's run [message..] argument without shell interpolation. Output is streamed as JSONL
events; SASE extracts assistant text from text events, captures errors from error events, and accumulates token
counters from step_finish events when OpenCode reports them.
Model Mapping¶
OpenCode model IDs normally include an upstream provider prefix. Use %model:opencode/<provider/model> to route a
single SASE prompt to a concrete OpenCode model.
| Tier | OpenCode Model |
|---|---|
large |
anthropic/claude-sonnet-4-5 |
small |
openai/gpt-5-mini |
Authentication and Config¶
Configure OpenCode through its normal auth and settings flow before using it from SASE. OpenCode stores auth under its
XDG data directory and reads config from its XDG config directory plus project .opencode config. Use opencode models
to inspect the models available in your configured OpenCode environment.
SASE deploys OpenCode skills under ~/.config/opencode/skills/, which OpenCode scans as part of its config directory.
SASE does not create a shadow OpenCode data/config home in this first implementation because OpenCode's normal headless
run writes session/database state under its XDG data directory while reading auth/config from the standard locations.
Environment Variables¶
| Variable | Description |
|---|---|
SASE_LLM_LARGE_ARGS |
Extra CLI args for large tier (generic, preferred) |
SASE_LLM_SMALL_ARGS |
Extra CLI args for small tier (generic, preferred) |
SASE_OPENCODE_PATH |
Path to the OpenCode CLI binary (default: opencode) |
SASE_OPENCODE_LARGE_ARGS |
Extra CLI args for large tier (OpenCode-specific fallback) |
SASE_OPENCODE_SMALL_ARGS |
Extra CLI args for small tier (OpenCode-specific fallback) |
The generic SASE_LLM_*_ARGS variables take precedence over SASE_OPENCODE_*_ARGS.
Timer Display¶
While waiting for a response, a provider_timer("Waiting for OpenCode") spinner is shown (unless suppress_output is
True).
External Provider Plugins¶
Additional LLM providers are shipped as external packages that declare [project.entry-points."sase_llm"] in their own
pyproject.toml. Plugins carry all their own metadata (model names, skill deploy path, CLI status color, auto-detect
priority, retry defaults) via pluggy @hookimpl methods — sase core has no plugin-specific branching.
External provider packages own their CLI invocation details, model metadata, skill deployment path, auto-detect
priority, and retry defaults. Install the provider package in the same environment as sase to make its sase_llm entry
point available.
Configuration¶
The LLM provider reads its configuration from ~/.config/sase/sase.yml under the llm_provider key.
Config File¶
llm_provider:
provider: claude # or "qwen", "opencode", "agy" (default: auto-detect)
worker_models:
claude: codex/gpt-5.5 # worker default when primary is on Claude
codex: claude/opus # worker default when primary is on Codex
model_tier_map:
large: opus
small: sonnet
model_aliases:
other: claude/opus
Config Fields¶
| Field | Type | Default | Description |
|---|---|---|---|
llm_provider.provider |
string | auto-detect | Which registered provider to use. Auto-detects by plugin-declared priority; built-ins default to claude → codex → qwen → opencode → agy. |
llm_provider.worker_models |
dict | unset | Optional worker-lane targets for plan follow-ups and epic phase agents, keyed by the effective primary lane. Values accept aliases, bare models, or explicit provider/model. |
llm_provider.model_tier_map.large |
string | - | Model identifier for the large tier |
llm_provider.model_tier_map.small |
string | - | Model identifier for the small tier |
llm_provider.model_aliases |
dict | - | Model aliases for %model:<alias> / %m:<alias>. Values can be bare known models, explicit provider/model, or nested provider-local model paths. |
Per-Prompt Provider Switching¶
The %model directive (see xprompt directives) can switch both the model and the LLM provider
for a single prompt. Provider resolution uses configured aliases first, then concrete provider/model syntax and known
model metadata.
Configured Model Aliases¶
Use llm_provider.model_aliases to define launch-time aliases for reusable prompts:
llm_provider:
model_aliases:
other: claude/opus
Then prompts can use:
%model:other
%{%m:other | %m:gpt-5.5}
Alias values may point at another alias, a bare known model such as opus, an explicit provider/model string such as
claude/opus, or a nested provider-local path such as opencode/anthropic/claude-sonnet-4-5. Cycles are ignored and
fall back to the raw input.
Reserved alias: other¶
The literal alias name other is reserved as a context-aware key. When a
temporary default override is active, %model:other (and %m:other) resolves to the
(provider, model) that was the effective default immediately before the override was set — captured in the
override's pre_override_* snapshot. When no override is active, other falls back to whatever the user configured
under llm_provider.model_aliases.other (or the literal model name other if no alias is configured).
This makes %{%m:other | %m:...} always pair "the alternate model" with the current default, even when the user has
temporarily switched their default via the ACE ,o chord. Without the snapshot, %{%m:other | %m:...} on an
override-displaced default could otherwise launch the override's model side-by-side with itself.
Reserved alias: worker¶
The literal alias name worker is reserved for the worker lane. %model:worker and %m(worker) resolve to the current
effective worker provider/model and shadow any llm_provider.model_aliases.worker entry.
This alias is how delegated launch sites opt into worker-lane selection without hardcoding a concrete model. For
example, sase bead work emits %model:worker for phase agents that do not have an explicit per-bead model.
Explicit Provider/Model Syntax¶
Use provider/model to specify both explicitly:
%model:codex/o3
%model:claude/opus
%model:agy/flash35h
%model:qwen/qwen3.6-plus
%model:opencode/anthropic/claude-sonnet-4-5
Automatic Provider Resolution¶
Known model names are automatically mapped to their provider:
| Model Name | Provider |
|---|---|
opus, sonnet, haiku, claude-fable-5 |
claude |
gpt-5.5, gpt-5.3-codex, codex-mini-latest, o3, o4-mini, gpt-5.4, gpt-4.1, gpt-4.1-mini, gpt-4o, gpt-4o-mini |
codex |
Gemini 3.5 Flash (High), Gemini 3.5 Flash (Medium), Gemini 3.5 Flash (Low), Gemini 3.1 Pro (High), Gemini 3.1 Pro (Low), Claude Sonnet 4.6 (Thinking), Claude Opus 4.6 (Thinking), GPT-OSS 120B (Medium) |
agy |
qwen3.6-plus, qwen3-coder-plus, qwen3-coder-flash, qwen3-max, qwen-plus, qwen-max |
qwen |
anthropic/claude-sonnet-4-5, anthropic/claude-opus-4-5, openai/gpt-5, openai/gpt-5-mini, google/gemini-3-flash-preview, qwen/qwen3-coder-plus |
opencode |
Each installed plugin contributes its own model names via the llm_known_model_names() hook.
For unrecognized model names, the prompt falls back to the default provider and a warning is logged at invocation time.
Source: src/sase/llm_provider/registry.py, src/sase/llm_provider/_invoke.py
Model Short Aliases¶
Providers also declare compact display shorthands for long model ids via the llm_model_short_aliases() hook. These
shorthands appear in provider/model agent-name suffixes on the Agents tab and act as
filter terms in the coder model picker. They are display-only: %model resolution uses known model names and
configured model aliases, not these shorthands. For example, %model:fable does not
select claude-fable-5 — it falls back to the default provider (with a warning) unless you define fable as a
configured model alias yourself.
| Provider | Shorthands |
|---|---|
| claude | claude-fable-5 → fable |
| codex | codex-mini-latest → mini, gpt-5.5 → gpt55, gpt-5.4 → gpt54, gpt-5.3-codex → gpt53, gpt-4.1 → gpt41, gpt-4.1-mini → gpt41m, gpt-4o-mini → gpt4om |
| agy | Gemini 3.5 Flash (High) → flash35h, Gemini 3.5 Flash (Medium) → flash35m, Gemini 3.5 Flash (Low) → flash35l, Gemini 3.1 Pro (High) → pro31h, Gemini 3.1 Pro (Low) → pro31l, Claude Sonnet 4.6 (Thinking) → sonnet46t, Claude Opus 4.6 (Thinking) → opus46t, GPT-OSS 120B (Medium) → gptoss120m |
| qwen | qwen3.6-plus → qwen36p, qwen3-coder-plus → qwen3cp, qwen3-coder-flash → qwen3cf |
| opencode | anthropic/claude-sonnet-4-5 → sonnet45, anthropic/claude-opus-4-5 → opus45, openai/gpt-5 → gpt5, openai/gpt-5-mini → gpt5m, google/gemini-3-flash-preview → flash3, qwen/qwen3-coder-plus → qwen3cp |
Source: llm_model_short_aliases() in each provider module under src/sase/llm_provider/
Model Tier System¶
The model tier system abstracts away specific model names. Callers request either "large" (most capable) or "small"
(faster/cheaper), and the provider maps the tier to a concrete model.
Type Definition¶
ModelTier = Literal["large", "small"]
Legacy Mapping¶
The old "big"/"little" terminology is still supported for backward compatibility:
| Old Value | New Tier | Display Label |
|---|---|---|
"big" |
"large" |
BIG |
"little" |
"small" |
LITTLE |
The model_size parameter on invoke_agent() is deprecated. Use model_tier instead.
Global Override¶
The model tier can be overridden globally via environment variable or CLI flag. The override forces ALL invocations to use the specified tier regardless of what the caller requests.
Resolution order:
SASE_MODEL_TIER_OVERRIDEenv var (accepts"large","small","big","little")SASE_MODEL_SIZE_OVERRIDEenv var (legacy, same values)--model-tier/--model-sizeCLI flag (sets the env var)- Caller's
model_tierparameter (default:"large")
Worker Model¶
The worker model is an optional secondary default for delegated execution work. It is used by plan follow-up agents when
the approval does not pick a specific follow-up model, and by sase bead work phase agents that do not have an explicit
per-bead model. Planning and landing agents stay on the primary default unless their prompt or bead explicitly asks for
a different model.
Configure it under llm_provider.worker_models:
llm_provider:
provider: claude
worker_models:
claude: codex/gpt-5.5
codex: claude/opus
Each key selects which worker target to use for the current effective primary lane. Keys are matched in this order:
exact provider/model first, bare model next, and provider last. Provider keys are defaults only, so claude/opus or
opus beats claude when both are present. Values accept the same syntax as %model: a bare known model (gpt-5.5),
a configured alias, an explicit provider/model pair (codex/gpt-5.5), or a nested provider-local model path.
For example:
llm_provider:
worker_models:
claude/opus: codex/gpt-5.5
sonnet: codex/o3
claude: agy/flash35h
With that config, primary claude/opus uses codex/gpt-5.5, primary claude/sonnet uses codex/o3, and other Claude
primary models use agy/flash35h.
Lane Precedence¶
Primary launches and worker launches resolve through separate lanes. The worker lane falls through to the primary lane only when no worker-specific setting exists:
Primary lane:
1. explicit %model directive
2. active primary temporary override (~/.sase/llm_override.json)
3. llm_provider.provider + requested model tier
4. provider auto-detection
Worker lane:
1. explicit %model directive or per-bead model
2. active worker temporary override (~/.sase/llm_worker_override.json)
3. matching llm_provider.worker_models entry
4. primary lane steps 2-4
Because of that fallthrough, leaving worker_models unset, empty, or unmatched preserves the old behavior: worker
launches use the same effective default that a normal launch would have used. Active primary temporary overrides affect
which mapping key is selected, so a primary override to codex/o3 can match codex/o3, o3, or codex.
TUI Controls¶
Press ,o in ACE to open the Model Overrides panel. The panel shows both lanes, their current effective model, and
the source of that model (override, config, follows primary, or default). Use s/c/x for primary override
set/change/clear and w/W for worker override set/change/clear. Active temporary worker overrides also appear as a
compact W ... chip in the top bar; permanent worker_models config is visible in the modal instead.
The worker override state file is ~/.sase/llm_worker_override.json. It uses the same JSON format, expiry behavior, and
atomic writes as the primary override file.
Temporary Default Override¶
In addition to the tier-based global override, sase supports a concrete provider/model override that acts as a
temporary session-level default. The ACE ,o chord opens the dual-lane Model Overrides panel for primary and worker
overrides (see docs/ace.md for the TUI flow).
The temporary override only changes the default provider/model selection for new agent launches. It does not override:
- Already-running agents — they keep whatever provider/model they were launched with.
- Explicit
%modelprompt directives — they still take precedence. - An explicit
provider_name=argument toinvoke_agent()— it still wins.
SASE_MODEL_TIER_OVERRIDE / SASE_MODEL_SIZE_OVERRIDE still force the tier for tier-based launches. A concrete
temporary override supplies a provider and model directly, so it is used only when no explicit model/provider was
requested.
Resolution Order (default provider/model)¶
When no %model directive and no explicit provider_name are present, the default is resolved as:
- Active primary temporary override at
~/.sase/llm_override.json(if not expired). llm_provider.providerfrom the mergedsase.ymlconfig.- Auto-detection by plugin-declared priority (built-ins: claude, codex, qwen, opencode, then agy).
A concrete temporary override sets both the default provider and a concrete model_override for the next launch — so
the agent metadata (running marker, plan review badge, agent rows) reflects the actual model that will run, not just the
configured default.
State File¶
{
"provider": "opencode",
"model": "anthropic/claude-sonnet-4-5",
"raw_model": "opencode/anthropic/claude-sonnet-4-5",
"created_at": 1777470000.0,
"expires_at": 1777473600.0,
"source": "ace",
"pre_override_provider": "claude",
"pre_override_model": "opus",
"pre_override_raw_model": "opus"
}
| Field | Type | Description |
|---|---|---|
provider |
str |
Resolved provider name (e.g. "claude", "codex", "opencode"). |
model |
str |
Concrete model passed to the provider (e.g. "o3", "opus"). |
raw_model |
str |
Original user input (e.g. "codex/o3", "opencode/anthropic/..."). |
created_at |
float |
Unix timestamp when the override was set. |
expires_at |
float \| None |
Unix timestamp when the override expires; null means "until cleared". |
source |
str |
Free-form tag indicating who set the override (e.g. "ace"). |
pre_override_provider |
str \| None |
Snapshot of the effective provider before the override was set. Used to resolve the reserved "other" alias dynamically. |
pre_override_model |
str \| None |
Snapshot of the effective model before the override. Pairs with pre_override_provider to form the "other" target. |
pre_override_raw_model |
str \| None |
Cosmetic copy of the displaced model's raw user-input form. May be None on legacy state files written before this field. |
Writes are atomic (temp file + os.replace). Reads are best-effort self-cleaning: an expired or unparseable file is
deleted on next access, so a forgotten override never lingers past its expires_at, even with no TUI running.
Model Resolution¶
The user-supplied raw_model is normalized through the same rules as %model:
provider/modelselects the provider explicitly (e.g.codex/o3oropencode/anthropic/claude-sonnet-4-5).- A bare known model name infers its provider from plugin metadata (e.g.
sonnet→ claude). - An unknown bare model is accepted and runs on the current default provider, matching
%modelbehavior.
Duration Parsing¶
Durations accept compact unit suffixes: 15m, 1h, 1h30m, 90m, 2h15m30s. Bare integers are interpreted as
minutes (45 → 45 minutes). The case-insensitive sentinel until cleared (or until_cleared) means "no expiry —
persists until the user clears it from the TUI or another sase process clears the state file."
Public API¶
The override primitives live in src/sase/llm_provider/temporary_override.py:
| Function | Purpose |
|---|---|
get_active_temporary_override(now=None, role=...) |
Read the active primary or worker override (auto-deletes expired/malformed files). |
set_temporary_override(raw, dur, source=, role=...) |
Write a new primary or worker override, replacing any existing one for that lane. |
clear_temporary_override(role=...) |
Remove the lane's override file. Safe to call when nothing is active. |
parse_override_duration(value) |
Parse a user-facing duration string into seconds (or None). |
resolve_effective_default_provider_model() |
Centralized helper used by metadata pre-resolution paths. |
resolve_effective_worker_provider_model() |
Resolve the worker lane: worker override, matching worker_models, then fallback. |
Examples¶
- ACE chord
,o, pickcodex/o3, duration1h→~/.sase/llm_override.jsonis written; new launches default to CODEX(o3) for the next hour. - ACE chord
,o, pickopencode/anthropic/claude-sonnet-4-5, duration1h→ new launches default to OPENCODE(anthropic/claude-sonnet-4-5). - ACE chord
,o, picksonnet, duration30m→ known bare model; provider resolves to claude via plugin metadata. - ACE chord
,o, choose Clear override →~/.sase/llm_override.jsonis removed; defaults revert to permanent config / autodetect. - ACE chord
,o, set worker override tocodex/gpt-5.5for1h→~/.sase/llm_worker_override.jsonis written; new%model:workerlaunches use CODEX(gpt-5.5) until the override expires or is cleared.
Environment Variables¶
Complete reference of environment variables used by the LLM provider layer.
Generic (Provider-Agnostic)¶
| Variable | Description |
|---|---|
SASE_LLM_LARGE_ARGS |
Extra CLI args for large tier invocations |
SASE_LLM_SMALL_ARGS |
Extra CLI args for small tier invocations |
SASE_MODEL_TIER_OVERRIDE |
Force all invocations to a specific model tier |
SASE_MODEL_SIZE_OVERRIDE |
Legacy alias for SASE_MODEL_TIER_OVERRIDE |
Claude-Specific¶
| Variable | Description |
|---|---|
SASE_CLAUDE_LARGE_ARGS |
Claude-specific extra args for large tier |
SASE_CLAUDE_SMALL_ARGS |
Claude-specific extra args for small tier |
Codex-Specific¶
| Variable | Description |
|---|---|
SASE_CODEX_PATH |
Path to the Codex CLI binary |
SASE_CODEX_LARGE_ARGS |
Codex-specific extra args for large tier |
SASE_CODEX_SMALL_ARGS |
Codex-specific extra args for small tier |
SASE_CODEX_DISABLE_SHADOW_HOME |
Set to 1 to disable the disposable Codex home |
Qwen-Specific¶
| Variable | Description |
|---|---|
SASE_QWEN_PATH |
Path to the Qwen Code CLI binary |
SASE_QWEN_LARGE_ARGS |
Qwen-specific extra args for large tier |
SASE_QWEN_SMALL_ARGS |
Qwen-specific extra args for small tier |
Antigravity (agy)-Specific¶
| Variable | Description |
|---|---|
SASE_AGY_PATH |
Path to the Antigravity CLI binary (default: "agy"). |
SASE_AGY_PRINT_TIMEOUT |
Override the agy --print-timeout Go duration (default: "24h"). |
SASE_AGY_LARGE_ARGS |
Antigravity-specific extra args for large tier |
SASE_AGY_SMALL_ARGS |
Antigravity-specific extra args for small tier |
OpenCode-Specific¶
| Variable | Description |
|---|---|
SASE_OPENCODE_PATH |
Path to the OpenCode CLI binary |
SASE_OPENCODE_LARGE_ARGS |
OpenCode-specific extra args for large tier |
SASE_OPENCODE_SMALL_ARGS |
OpenCode-specific extra args for small tier |
External provider plugins document their own environment variables in their respective repos.
VCS Provider¶
| Variable | Description |
|---|---|
SASE_VCS_PROVIDER |
Override VCS provider ("git", "hg", or "auto") |
CLI Flags¶
ace¶
| Flag | Values | Description |
|---|---|---|
-m, --model-tier |
large, small |
Override model tier for all LLM invocations |
--model-size |
big, little |
Deprecated alias for --model-tier |
--vcs-provider |
git, hg, auto |
Override VCS provider |
axe¶
| Flag | Values | Description |
|---|---|---|
--vcs-provider |
git, hg, auto |
Override VCS provider |
The ace command wires --model-tier / --model-size into the model_tier_override parameter of AceApp. The
--vcs-provider flag is wired to the SASE_VCS_PROVIDER environment variable for downstream resolution.
Retry and Fallback¶
The LLM provider layer supports per-provider retry and fallback configuration. When an agent encounters a retryable error, it can automatically wait and retry, then optionally fall back to an alternate model.
Configuration¶
Retry behavior is configured per provider under llm_provider.retry in sase.yml:
llm_provider:
retry:
claude:
max_retries: 3
error_patterns:
- "API Error: 500"
wait_times: [60, 300, 1800]
fallback_model: "sonnet"
Config Fields¶
| Field | Type | Default | Description |
|---|---|---|---|
max_retries |
int | 0 |
Maximum retry attempts. 0 disables retrying. |
error_patterns |
list[str] | [] |
Case-insensitive substring patterns matched against error output. |
wait_times |
list[int] | [30] |
Per-retry wait times in seconds. Last value reused if list is too short. |
fallback_model |
str \| null |
null |
Alternate model to use after exhausting all retries. |
continuation_prompt |
str \| null |
null |
Text prepended to state.current_prompt on every retry (used to nudge the agent). |
preserve_workspace |
bool | false |
Preserve on-disk edits across legacy in-process retry attempts. |
spawn_new_agent |
bool | false |
Opt in to spawn-on-retry: a retryable error spawns a fresh detached child agent (as if sase run -d had been invoked) instead of in-process retry. See Spawn-on-Retry below. |
Default Configuration¶
Retry defaults can come from two places: configured policy under llm_provider.retry and provider-supplied defaults
from the llm_default_retry_config() hook. The bundled default_config.yml already provides configured policy for
Claude and Codex; user config can replace or extend it through the normal config merge.
Claude:
- max_retries: 3
- error_patterns:
["API Error: 500", "API Error: 529", "Internal server error", "overloaded_error"] - wait_times:
[60, 300, 1800](1 min, 5 min, 30 min) - fallback_model:
"sonnet"
Codex:
- max_retries: 3
- error_patterns:
["exceeded retry limit", "429 Too Many Requests", "Too Many Requests", "rate limit", "failed to connect to websocket"]— the Codex CLI's own give-up message, the terminal rate-limit status, and the transient websocket transport error. A bare403 Forbiddenis deliberately excluded so a persistent auth failure is not retried forever. - wait_times:
[60, 300, 1800](1 min, 5 min, 30 min) — rate limits need a real cool-down
Provider-Supplied Retry Defaults¶
Providers can also declare retry defaults through the llm_default_retry_config() hook. Both Claude and Codex declare a
recovery entry that is merged with their configured policy.
Claude:
- error patterns:
"Prompt is too long","socket connection was closed unexpectedly", and"API Error" - max_retries: 3
- wait_times:
[0]— used only when no config layer supplieswait_times; the bundled Claude policy supplies[60, 300, 1800], so that is the out-of-the-box backoff - continuation_prompt: A short nudge that tells the coder to inspect
git status/git diffbefore resuming, since prior edits are preserved on disk after a context-limit, socket-close, or API-error retry - preserve_workspace:
true
Codex:
- error patterns:
"exceeded retry limit","429 Too Many Requests","Too Many Requests","rate limit", and"failed to connect to websocket"— the transient transport / rate-limit failure mode where the Codex CLI exhausts its own internal reconnects and exits non-zero - max_retries: 3
- wait_times:
[60, 300, 1800]— the bundled Codex policy supplies the same backoff - continuation_prompt: The same
git status/git diffresume nudge as Claude - preserve_workspace:
true
Configured llm_provider.retry.<provider> values are merged on top of provider-supplied defaults: explicit falsy values
(max_retries: 0 to opt out entirely, continuation_prompt: "" to disable the nudge) override the built-in via
key-presence checks. error_patterns is a de-duplicated union of built-in and configured lists.
On every retry attempt the continuation_prompt (if non-empty) is idempotently prepended to state.current_prompt
before the next invocation — the prepend is gated on a startswith check so repeated retries don't stack duplicate
nudges. Workspaces are preserved across Claude's built-in context-limit, socket-close, and API-error retries (no
workspace wipe), so on-disk edits remain available to the restarted session.
Retry Flow¶
Error detected
│
├── Does error match error_patterns? (case-insensitive substring)
│ ├── No → fail immediately
│ └── Yes → retry_count < max_retries?
│ ├── Yes → wait (wait_times[retry_count]) → retry
│ └── No → fallback_model configured and not already using fallback?
│ ├── Yes → set fallback model override → retry once
│ └── No → fail
Wait periods are interruptible — if the agent is killed during a wait, it stops immediately.
TUI Display¶
The ACE Agents tab reflects retry state (see Retry/Fallback Display):
- RETRYING (Ns) — Waiting before the next attempt (bold orange, with countdown)
- ↻N — Retry count annotation on running agents
- ▸Model — Fallback model annotation (e.g.,
↻3▸flash)
Metadata Tracking¶
If any retries occurred or a fallback model was used, retry metadata is written to done.json in the agent's artifacts
directory after execution completes (runs that succeed on the first attempt omit these fields):
{
"retry_count": 2,
"retry_errors": ["An unexpected critical error occurred: ..."],
"used_fallback": false
}
When used_fallback is true, the metadata also includes the fallback_model that served the final attempt.
Source: src/sase/llm_provider/retry_config.py, src/sase/axe/run_agent_exec_finalize.py
Spawn-on-Retry¶
When ProviderRetryConfig.spawn_new_agent=True, a retryable error spawns a fresh detached child agent (as if
sase run -d had been invoked) instead of running the next attempt in-process. The failing parent transfers its
workspace claim to the child via transfer_workspace_claim() and exits with status FAILED (RETRIED). This trades the
small cost of a fresh process for two benefits:
- The workspace is preserved by design — the child skips
prepare_workspace()and inherits the parent's in-progress edits via the transferred workspace claim. (Legacy in-process retry runsprepare_workspace()between attempts and wipes uncommitted file edits unlesspreserve_workspace=True.) - A retry boundary becomes a real process boundary, which is more robust against memory leaks, lingering child processes, and stale interpreter state.
Linkage fields (written to both agent_meta.json and done.json so retry chains are queryable from either side):
| Field | Meaning |
|---|---|
retry_of_timestamp |
Backward link: the parent agent's run timestamp. |
retried_as_timestamp |
Forward link: the child agent's run timestamp (written on the parent at handoff). |
retry_chain_root_timestamp |
The root agent's timestamp — stable across the entire chain. |
retry_attempt |
Depth in the chain (1-based). |
State is carried across the boundary by a retry_handoff.json file written to the parent's artifacts directory; the
child reads it before launch.
Fallback behavior: spawn-on-retry is opt-in (default false). If spawning fails (e.g. workspace transfer fails),
the legacy in-process retry runs as a fallback so the user is never worse off.
Source: src/sase/axe/run_agent_retry_spawn.py, src/sase/llm_provider/retry_config.py
Legacy Thinking Metadata¶
Older parser helpers can still read provider thinking/reasoning artifacts when a caller uses them directly. For Claude
extended-thinking events whose thinking text is empty but whose payload contains an opaque signature, those helpers
produce an encrypted-thinking placeholder instead of hiding the block. When Claude also reports
message.usage.output_tokens, the placeholder includes an approximate output-token count so the caller can tell that
reasoning occurred even though the raw thought text is not available. The Agents tab now uses the Tools panel for
provider tool activity instead of exposing these thinking helpers as a panel.
Token Usage Tracking¶
The LLM provider layer tracks token usage for providers that emit parseable usage events. Claude and Qwen usage is read
from their stream-json result events. OpenCode usage is accumulated from step_finish token counters. Codex currently
captures assistant text and reasoning summaries but does not emit usage.json.
When usage is available, input tokens, output tokens, cache-creation tokens, and cache-read tokens are persisted as a
usage.json artifact in the agent run directory.
Artifact Format¶
{
"input_tokens": 12345,
"output_tokens": 6789,
"cache_creation_input_tokens": 0,
"cache_read_input_tokens": 3456
}
When telemetry is enabled, token counts are also recorded as Prometheus counters (sase_llm_input_tokens_total,
sase_llm_output_tokens_total, sase_llm_cache_read_tokens_total) for monitoring and dashboards. See
docs/telemetry.md for the full telemetry reference.
Source: src/sase/llm_provider/_subprocess.py, src/sase/llm_provider/types.py
Prompt Preprocessing Pipeline¶
Before any prompt reaches a provider, it passes through the shared preprocessing pipeline defined in preprocessing.py.
The pipeline has an early phase used for xprompt expansion and directive extraction, then a late phase used for command,
file, template, and formatting work.
Steps¶
| Phase | Step | Syntax | Description |
|---|---|---|---|
| Early | Optional workflow Jinja2 | {{ var }} |
Render workflow-supplied template context before xprompt |
| Early | xprompt references | #name |
Expand reusable prompt snippets or workflows |
| Early | Prompt directives | %model, %m, other %... directives |
Extract directives after xprompt expansion |
| Late | Disabled/fenced protection | %xprompts_enabled:false, fenced code |
Protect regions that should not be rewritten |
| Late | Command substitution | $(cmd) |
Execute shell commands and inline their output |
| Late | File references | @path |
Process, validate, or skip file references |
| Late | Top-level Jinja2 | {{ var }} |
Render remaining top-level Jinja2 templates |
| Late | Prettier formatting | - | Format with prettier for consistent markdown |
| Late | Comment stripping | <!-- ... --> |
Remove HTML/markdown comments |
| Late | Restore protected regions | fenced code / disabled-region placeholders | Restore protected content after rewrites |
Order Matters¶
The pipeline runs in strict order. Prompt directives are extracted after xprompt expansion, so directives embedded in xprompts are honored. Late-phase command substitution and file-reference processing run with fenced blocks protected, so examples inside code fences are not executed or rewritten.
Home Mode¶
When is_home_mode=True, file-reference processing skips copy side effects. This is used when the invocation doesn't
need workspace-local copies from @path references.
Source Functions¶
The preprocessing steps delegate to functions from two libraries:
xprompt:process_xprompt_references(),extract_prompt_directives(),is_jinja2_template(),render_toplevel_jinja2()file_references:process_command_substitution(),process_file_references(),validate_file_references(),format_with_prettier(),strip_html_comments()
Subprocess Streaming¶
Providers use shared helpers in _subprocess.py and the _subprocess_* modules to stream LLM output in real time.
Plain text, JSON-line, and provider-specific parsers share the same artifact hooks for live replies and usage files.
Mechanism¶
- The provider spawns the CLI tool via
subprocess.Popen. Providers that consume prompts from stdin setstdin=PIPE; OpenCode passes the prompt as the finalopencode runargument. - The prompt is supplied using the provider's documented transport, either stdin or an argv message argument.
- Stdout and stderr are set to non-blocking mode via
os.set_blocking(). - A
select.select()loop with a 0.1s timeout polls for readable data on both streams. - Lines are read, parsed when needed, and optionally printed to the console in real time.
- After the process exits (
process.poll() is not None), any remaining buffered output is drained. - Helpers return stdout/assistant text, stderr diagnostics, return code, and usage data when the provider reports it.
Live Reply File¶
When SASE_ARTIFACTS_DIR is set, the streaming output is also written in real-time to
<SASE_ARTIFACTS_DIR>/live_reply.md. This file is used by the ACE TUI Agents tab to display the agent's reply as it
streams in, and remains available after execution completes for the metadata panel's AGENT REPLY section.
Providers that support richer streams may write companion artifacts. Codex writes reasoning summaries to
<SASE_ARTIFACTS_DIR>/codex_thinking.jsonl; providers with token counters write <SASE_ARTIFACTS_DIR>/usage.json.
Output Suppression¶
When suppress_output=True, lines are still captured but not printed to the console. This is used for background
invocations where the caller only needs the final result.
Postprocessing¶
After a provider returns (or raises an error), the orchestration layer runs postprocessing steps.
On Success (postprocess_success)¶
- Audio notification: Plays a sound via
run_bam_command("Agent reply received")(skipped ifsuppress_output). - Log to sase.md: Appends a timestamped entry with the prompt and response to
<artifacts_dir>/sase.md(ifartifacts_diris set). - Save chat history: Writes to
~/.sase/chats/ifworkflowis set. See Chat History.
On Error (postprocess_error)¶
- Rich error display: Prints the prompt and error via
print_prompt_and_response()with an_ERRORsuffix on the agent type label (skipped ifsuppress_output). - Log to sase.md: Same as success, but the response is the error message and the agent type gets an
_ERRORsuffix. - Save error chat history: Writes to
~/.sase/chats/with an_ERRORagent suffix.
sase.md Log Format¶
Each entry in the log file follows this format:
## <timestamp> - <agent_type> - iteration <N> - tag <workflow_tag>
### PROMPT:
\`\`\` <prompt text> \`\`\`
### RESPONSE:
\`\`\` <response text> \`\`\`
---
Prompt File Saving¶
Before invocation, the preprocessed prompt is saved to <artifacts_dir>/<agent_type>_prompt.md (or
<agent_type>_iter_<N>_prompt.md if an iteration number is set). This allows reviewing the exact prompt that was sent.
Chat History¶
Chat histories are stored as markdown files in ~/.sase/chats/.
File Naming¶
<branch_or_workspace>-<workflow>-[<agent>-]<timestamp>.md
| Part | Source | Example |
|---|---|---|
branch_or_workspace |
Output of branch_or_workspace_name |
my_feature |
workflow |
Workflow name, normalized | crs, run |
agent |
Agent type (omitted if same as workflow) | editor, planner |
timestamp |
YYmmdd_HHMMSS format |
260214_153042 |
Dashes and slashes in workflow names are normalized to underscores.
File Format¶
# Chat History - <workflow> (<agent>)
**Timestamp** <display_timestamp>
**MODEL** <provider>/<model>
**AGENT** <sase_agent_name>
## Previous Conversation
<previous history if resuming>
---
## Prompt
<prompt text>
## Response
<response text>
The MODEL and AGENT blocks are omitted when the invocation did not provide that metadata. MODEL can contain just a
model name, just a provider name, or both. When both provider and model are known, it is rendered as
<provider>/<model> unless the model already includes that prefix.
Resume Support¶
The sase run --resume flag resumes a previous conversation by agent name. The #fork workflow resolves the agent name
to its artifacts directory, extracts the response path from done.json, and delegates to #fork_by_chat which loads
the chat history and prepends it to the new conversation. The --resume flag also accepts a history file basename or
full path for direct chat-file-based resumption via the #fork_by_chat workflow.
Fork expansion is recursive: if the loaded chat history itself contains #fork or #fork_by_chat references, those are
expanded inline as well. Legacy #resume and #resume_by_chat references in old transcripts are still recognized.
Cycle detection prevents infinite loops when chat histories reference each other.
Invocation Lifecycle¶
The invoke_agent() function in _invoke.py orchestrates the complete lifecycle of an LLM invocation. Here is the
end-to-end flow:
invoke_agent(prompt, agent_type, model_tier, ...)
│
├── 1. Handle deprecated model_size → model_tier mapping
├── 2. Check SASE_MODEL_TIER_OVERRIDE / SASE_MODEL_SIZE_OVERRIDE env vars
├── 3. Build LoggingContext from parameters
│
├── 4. Preprocess prompt unless skip_preprocessing=True
│ ├── early phase: optional workflow Jinja2, xprompt expansion, directive extraction
│ └── late phase: command substitution, file refs, top-level Jinja2, formatting, comment stripping
│
├── 5. Resolve %model / temporary provider-model override
├── 6. Display decision counts (if not suppressed)
├── 7. Print prompt via Rich (if not suppressed)
├── 8. Generate or use provided timestamp
├── 9. Save prompt to artifacts directory
│
├── 10. Get provider from registry and invoke
│ ├── Build CLI command with flags
│ ├── Spawn subprocess (Popen)
│ ├── Supply prompt via provider transport
│ └── Stream stdout/stderr in real-time
│
├── 11. Run commit finalizer for SASE agent sessions
│ ├── Skip when disabled or outside an agent session
│ ├── Check main workspace and configured Git linked repos
│ ├── Treat static linked repos as advisory dirty targets
│ ├── Auto-commit exact tracked SDD done-status closeouts
│ └── Run bounded follow-up provider invocations until enforced repos are clean or failed
│
├── 12. Postprocess
│ ├── Success path:
│ │ ├── Audio notification
│ │ ├── Log to sase.md
│ │ └── Save chat history
│ └── Error path:
│ ├── Rich error display
│ ├── Log error to sase.md
│ └── Save error chat history
│
└── 12. Return AIMessage(content=response), or raise LLMInvocationError on failure
Parameters¶
| Parameter | Type | Default | Description |
|---|---|---|---|
prompt |
str |
(required) | Raw prompt to send |
agent_type |
str |
(required) | Agent type label (e.g., "editor") |
model_tier |
ModelTier |
"large" |
Model tier to use |
model_size |
"big" \| "little" \| None |
None |
Deprecated, use model_tier |
iteration |
int \| None |
None |
Iteration number for logging |
workflow_tag |
str \| None |
None |
Workflow tag for logging |
artifacts_dir |
str \| None |
None |
Directory for sase.md, prompt, and stream files |
workflow |
str \| None |
None |
Workflow name for chat history |
suppress_output |
bool |
False |
Suppress console output |
timestamp |
str \| None |
None |
Shared timestamp (YYmmdd_HHMMSS) |
is_home_mode |
bool |
False |
Skip file copying for @ references |
branch_or_workspace |
str \| None |
None |
Override the chat-history filename prefix |
decision_counts |
dict[str, Any] \| None |
None |
Planning agent decision counts |
provider_name |
str \| None |
None |
Override provider (default from config) |
skip_preprocessing |
bool |
False |
Use prompt as already-preprocessed input |
directives |
PromptDirectives \| None |
None |
Pre-extracted directives for skip_preprocessing |
Return Value¶
On success, returns an AIMessage (from langchain_core.messages) whose content is the provider response. On
provider failure, invoke_agent() logs the error and raises LLMInvocationError with the formatted error text.