Skip to content

LLM Provider Integration

This document describes the LLM provider abstraction layer in sase. The system supports pluggable LLM backends (Claude Code, Codex, Gemini CLI, Qwen Code, and OpenCode are bundled; additional providers can ship as external plugins) behind a shared orchestration layer that handles preprocessing, invocation, and postprocessing.

Table of Contents

Overview

The LLM provider layer decouples prompt handling from the underlying LLM backend. All providers share a common preprocessing pipeline, subprocess streaming mechanism, and postprocessing workflow. The actual LLM invocation is delegated to a pluggable provider selected at runtime.

Key design principles:

  • Providers are thin: They only construct CLI commands and run subprocesses. All preprocessing and postprocessing lives in the shared orchestration layer.
  • Registry-based selection: Providers register themselves by name and are resolved via config or explicit override.
  • Tier-based model selection: Callers request a "large" or "small" tier; the provider maps it to a concrete model.

Source Layout

File Purpose
src/sase/llm_provider/__init__.py Public API exports
src/sase/llm_provider/base.py LLMProvider abstract base class
src/sase/llm_provider/_hookspec.py Pluggy hook specifications (LLMHookSpec)
src/sase/llm_provider/_plugin_manager.py Plugin manager wrapping pluggy (LLMPluginManager)
src/sase/llm_provider/claude.py Claude Code provider implementation
src/sase/llm_provider/gemini.py Gemini CLI provider implementation
src/sase/llm_provider/qwen.py Qwen Code provider implementation
src/sase/llm_provider/opencode.py OpenCode provider implementation
src/sase/llm_provider/registry.py Provider registration and lookup
src/sase/llm_provider/config.py Config file reader (sase.yml)
src/sase/llm_provider/types.py ModelTier, LoggingContext types
src/sase/llm_provider/_invoke.py invoke_agent() orchestrator
src/sase/llm_provider/_subprocess.py stream_process_output()
src/sase/llm_provider/codex.py Codex CLI provider implementation
src/sase/llm_provider/_plan_utils.py Shared plan utilities
src/sase/llm_provider/preprocessing.py 6-step preprocessing pipeline
src/sase/llm_provider/postprocessing.py Logging, chat history, audio
src/sase/llm_provider/retry_config.py ProviderRetryConfig (per-provider retry defaults)

Provider Architecture

Base Class

All providers implement the LLMProvider abstract base class:

class LLMProvider(ABC):
    @abstractmethod
    def invoke(
        self,
        prompt: str,
        *,
        model_tier: ModelTier,
        suppress_output: bool = False,
    ) -> str: ...
Parameter Type Description
prompt str Already-preprocessed prompt text
model_tier ModelTier "large" or "small"
suppress_output bool If True, suppress real-time console output

Returns the raw response text. Raises subprocess.CalledProcessError on failure.

Registry

Providers are registered by name in a global registry (registry.py). Built-in providers are auto-registered on module import:

Providers are discovered via importlib.metadata.entry_points(group="sase_llm"). Built-in entries live in pyproject.toml:

[project.entry-points."sase_llm"]
claude = "sase.llm_provider.claude:ClaudeCodeProvider"
codex  = "sase.llm_provider.codex:CodexProvider"
gemini = "sase.llm_provider.gemini:GeminiProvider"
opencode = "sase.llm_provider.opencode:OpenCodeProvider"
qwen   = "sase.llm_provider.qwen:QwenProvider"

External plugin packages declare additional entries under the same group.

To get a provider instance:

provider = get_provider()          # Uses default from config
provider = get_provider("claude")  # Explicit provider name

Selection Logic

  1. If provider_name is passed to invoke_agent(), use that.
  2. Otherwise, read the llm_provider.provider field from ~/.config/sase/sase.yml.
  3. If no config exists (or provider is empty), auto-detect by walking registered plugins in ascending llm_autodetect_priority() order and picking the first whose llm_autodetect_cli_name() is on PATH. Built-in priorities: claude=0, codex=10, qwen=15, opencode=18, gemini=30. External plugins slot in by declaring their own priority.

Claude Code Integration

The ClaudeCodeProvider invokes the claude CLI tool.

Command Construction

claude -p --model <alias> --output-format text --dangerously-skip-permissions [extra_args...]

The prompt is written to stdin, and output is streamed from stdout in real-time.

Model Mapping

Tier Claude CLI Alias
large opus
small sonnet

Environment Variables

Variable Description
SASE_LLM_LARGE_ARGS Extra CLI args for large tier (generic, preferred)
SASE_LLM_SMALL_ARGS Extra CLI args for small tier (generic, preferred)
SASE_CLAUDE_LARGE_ARGS Extra CLI args for large tier (Claude-specific fallback)
SASE_CLAUDE_SMALL_ARGS Extra CLI args for small tier (Claude-specific fallback)

The generic SASE_LLM_*_ARGS variables take precedence. Values are split on whitespace and appended to the command.

Timer Display

While waiting for a response, a gemini_timer("Waiting for Claude") spinner is shown (unless suppress_output is True).

Gemini CLI Integration

The GeminiProvider invokes Google's internal Gemini CLI tool.

Command Construction

gemini --yolo [extra_args...]

The prompt is written to stdin, and output is streamed from stdout in real-time.

Default Model

The Gemini provider uses gemini-3-flash-preview as its default model. This can be overridden per-prompt using the %model directive (e.g., %model:gemini-2.5-flash).

Environment Variables

Variable Description
SASE_GEMINI_PATH Path to the Gemini CLI binary (default: "gemini").

Timer Display

While waiting for a response, a gemini_timer("Waiting for Gemini") spinner is shown (unless suppress_output is True).

Codex CLI Integration

The CodexProvider invokes the OpenAI codex CLI tool.

Command Construction

Normal mode:

codex exec --model <model> --dangerously-bypass-approvals-and-sandbox --json --color never --skip-git-repo-check - [extra_args...]

The prompt is written to stdin. Output is streamed as NDJSON events, with assistant text extracted from item.completed events.

Model Mapping

Tier Codex Model
large gpt-5.5
small codex-mini-latest

Plan Mode

When SASE_AGENT_PLAN_MODE is set, Codex runs a two-phase plan/implement flow:

  1. Phase 1 (Planning): Runs with --sandbox read-only and --ask-for-approval on-request. The model generates a plan captured via --output-last-message, on-disk plan files, or streamed response text.
  2. Approval: The plan is presented for user approval with up to 5 feedback-retry rounds.
  3. Phase 2 (Implementation): On approval, runs with full permissions (--dangerously-bypass-approvals-and-sandbox) using the plan content as the prompt.

Environment Variables

Variable Description
SASE_LLM_LARGE_ARGS Extra CLI args for large tier (generic, preferred)
SASE_LLM_SMALL_ARGS Extra CLI args for small tier (generic, preferred)
SASE_CODEX_PATH Path to the Codex CLI binary (default: PATH, then NVM_BIN)
SASE_CODEX_LARGE_ARGS Extra CLI args for large tier (Codex-specific fallback)
SASE_CODEX_SMALL_ARGS Extra CLI args for small tier (Codex-specific fallback)
SASE_CODEX_DISABLE_SHADOW_HOME Set to 1 to disable the disposable Codex home
SASE_AGENT_PLAN_MODE Enable two-phase plan/implement flow

The generic SASE_LLM_*_ARGS variables take precedence over SASE_CODEX_*_ARGS.

By default, SASE launches Codex with a per-invocation shadow CODEX_HOME under ~/.cache/sase/codex_home/. The shadow home copies config.toml and symlinks other Codex home entries back to the real Codex home so Codex can read auth, hooks, skills, logs, and caches while any config rewrites stay disposable. The shadow directory is removed after each Codex subprocess exits. Set SASE_CODEX_DISABLE_SHADOW_HOME=1 to pass through the inherited environment directly for debugging or emergency compatibility.

Timer Display

While waiting for a response, a gemini_timer("Waiting for Codex") spinner is shown (unless suppress_output is True). In plan mode, the timer reads "Waiting for Codex (planning)" during Phase 1 and "Implementing plan" during Phase 2.

Qwen Code Integration

The QwenProvider invokes the qwen CLI tool.

Command Construction

qwen --input-format text --output-format stream-json --yolo --model <model> [extra_args...]

The prompt is written to stdin using Qwen's text input mode. Output is streamed as JSON events; SASE extracts assistant text from assistant events and falls back to the final result text when no assistant text is emitted.

Model Mapping

Tier Qwen Model
large qwen3-coder-plus
small qwen3-coder-flash

Authentication

Configure Qwen Code through its supported auth and settings flow before using it from SASE. Qwen OAuth free tier access ended on 2026-04-15; use API keys, Alibaba Cloud Coding Plan, OpenRouter, Fireworks, or another Qwen-supported provider instead of relying on the discontinued OAuth free tier.

Environment Variables

Variable Description
SASE_LLM_LARGE_ARGS Extra CLI args for large tier (generic, preferred)
SASE_LLM_SMALL_ARGS Extra CLI args for small tier (generic, preferred)
SASE_QWEN_PATH Path to the Qwen Code CLI binary (default: qwen)
SASE_QWEN_LARGE_ARGS Extra CLI args for large tier (Qwen-specific fallback)
SASE_QWEN_SMALL_ARGS Extra CLI args for small tier (Qwen-specific fallback)

The generic SASE_LLM_*_ARGS variables take precedence over SASE_QWEN_*_ARGS.

Qwen Code config is left in Qwen's normal locations (~/.qwen/settings.json and project .qwen/settings.json). SASE does not create a shadow Qwen home in the first implementation because local Qwen was unavailable during this phase, so no normal headless-run config mutation could be verified.

Timer Display

While waiting for a response, a gemini_timer("Waiting for Qwen") spinner is shown (unless suppress_output is True).

OpenCode Integration

The OpenCodeProvider invokes the opencode CLI tool.

Command Construction

opencode run --format json --dangerously-skip-permissions --model <provider/model> --dir <cwd> [extra_args...] <prompt>

The prompt is passed as OpenCode's run [message..] argument without shell interpolation. Output is streamed as JSONL events; SASE extracts assistant text from text events, captures errors from error events, and accumulates token counters from step_finish events when OpenCode reports them.

Model Mapping

OpenCode model IDs normally include an upstream provider prefix. Use %model:opencode/<provider/model> to route a single SASE prompt to a concrete OpenCode model.

Tier OpenCode Model
large anthropic/claude-sonnet-4-5
small openai/gpt-5-mini

Authentication and Config

Configure OpenCode through its normal auth and settings flow before using it from SASE. OpenCode stores auth under its XDG data directory and reads config from its XDG config directory plus project .opencode config. Use opencode models to inspect the models available in your configured OpenCode environment.

SASE deploys OpenCode skills under ~/.config/opencode/skills/, which OpenCode scans as part of its config directory. SASE does not create a shadow OpenCode data/config home in this first implementation because OpenCode's normal headless run writes session/database state under its XDG data directory while reading auth/config from the standard locations.

Environment Variables

Variable Description
SASE_LLM_LARGE_ARGS Extra CLI args for large tier (generic, preferred)
SASE_LLM_SMALL_ARGS Extra CLI args for small tier (generic, preferred)
SASE_OPENCODE_PATH Path to the OpenCode CLI binary (default: opencode)
SASE_OPENCODE_LARGE_ARGS Extra CLI args for large tier (OpenCode-specific fallback)
SASE_OPENCODE_SMALL_ARGS Extra CLI args for small tier (OpenCode-specific fallback)

The generic SASE_LLM_*_ARGS variables take precedence over SASE_OPENCODE_*_ARGS.

Timer Display

While waiting for a response, a gemini_timer("Waiting for OpenCode") spinner is shown (unless suppress_output is True).

External Provider Plugins

Additional LLM providers are shipped as external packages that declare [project.entry-points."sase_llm"] in their own pyproject.toml. Plugins carry all their own metadata (model names, skill deploy path, CLI status color, auto-detect priority, retry defaults) via pluggy @hookimpl methods — sase core has no plugin-specific branching.

External provider packages own their CLI invocation details, model metadata, skill deployment path, auto-detect priority, and retry defaults. Install the provider package in the same environment as sase to make its sase_llm entry point available.

Configuration

The LLM provider reads its configuration from ~/.config/sase/sase.yml under the llm_provider key.

Config File

llm_provider:
  provider: claude # or "qwen", "opencode", "gemini" (default: auto-detect)
  model_tier_map:
    large: opus
    small: sonnet

Config Fields

Field Type Default Description
llm_provider.provider string auto-detect Which registered provider to use. Auto-detects by plugin-declared priority; built-ins default to claude → codex → qwen → opencode → gemini.
llm_provider.model_tier_map.large string - Model identifier for the large tier
llm_provider.model_tier_map.small string - Model identifier for the small tier

Per-Prompt Provider Switching

The %model directive (see xprompt directives) can switch both the model and the LLM provider for a single prompt. Provider resolution uses two strategies:

Explicit Provider/Model Syntax

Use provider/model to specify both explicitly:

%model:codex/o3
%model:claude/opus
%model:gemini/gemini-2.5-pro
%model:qwen/qwen3-coder-plus
%model:opencode/anthropic/claude-sonnet-4-5

Automatic Provider Resolution

Known model names are automatically mapped to their provider:

Model Name Provider
opus, sonnet, haiku claude
gpt-5.5, gpt-5.3-codex, codex-mini-latest, o3, o4-mini, gpt-5.4, gpt-4.1, gpt-4.1-mini, gpt-4o, gpt-4o-mini codex
gemini-2.5-pro, gemini-2.5-flash, gemini-3.1-pro-preview, gemini-3-flash-preview, gemini-2.0-flash gemini
qwen3-coder-plus, qwen3-coder-flash, qwen3-max, qwen-plus, qwen-max qwen
anthropic/claude-sonnet-4-5, openai/gpt-5-mini, qwen/qwen3-coder-plus opencode

Each installed plugin contributes its own model names via the llm_known_model_names() hook.

For unrecognized model names, the default provider is used.

Source: src/sase/llm_provider/registry.py

Model Tier System

The model tier system abstracts away specific model names. Callers request either "large" (most capable) or "small" (faster/cheaper), and the provider maps the tier to a concrete model.

Type Definition

ModelTier = Literal["large", "small"]

Legacy Mapping

The old "big"/"little" terminology is still supported for backward compatibility:

Old Value New Tier Display Label
"big" "large" BIG
"little" "small" LITTLE

The model_size parameter on invoke_agent() is deprecated. Use model_tier instead.

Global Override

The model tier can be overridden globally via environment variable or CLI flag. The override forces ALL invocations to use the specified tier regardless of what the caller requests.

Resolution order:

  1. SASE_MODEL_TIER_OVERRIDE env var (accepts "large", "small", "big", "little")
  2. SASE_MODEL_SIZE_OVERRIDE env var (legacy, same values)
  3. --model-tier / --model-size CLI flag (sets the env var)
  4. Caller's model_tier parameter (default: "large")

Temporary Default Override

In addition to the tier-based global override, sase supports a concrete provider/model override that acts as a temporary session-level default. This is the override the ACE ,P chord writes (see docs/ace.md for the TUI flow).

The temporary override only changes the default provider/model selection for new agent launches. It does not affect:

  • Already-running agents — they keep whatever provider/model they were launched with.
  • Explicit %model prompt directives — they still take precedence.
  • An explicit provider_name= argument to invoke_agent() — it still wins.
  • SASE_MODEL_TIER_OVERRIDE / SASE_MODEL_SIZE_OVERRIDE — those force a tier across all invocations regardless of this override; they layer on top, not under.

Resolution Order (default provider/model)

When no %model directive and no explicit provider_name are present, the default is resolved as:

  1. Active temporary override at ~/.sase/llm_override.json (if not expired).
  2. llm_provider.provider from the merged sase.yml config.
  3. Auto-detection by plugin-declared priority (built-ins: claude, codex, qwen, opencode, then gemini).

A concrete temporary override sets both the default provider and a concrete model_override for the next launch — so the agent metadata (running marker, plan review badge, agent rows) reflects the actual model that will run, not just the configured default.

State File

{
  "provider": "opencode",
  "model": "anthropic/claude-sonnet-4-5",
  "raw_model": "opencode/anthropic/claude-sonnet-4-5",
  "created_at": 1777470000.0,
  "expires_at": 1777473600.0,
  "source": "ace"
}
Field Type Description
provider str Resolved provider name (e.g. "claude", "codex", "opencode").
model str Concrete model passed to the provider (e.g. "o3", "opus").
raw_model str Original user input (e.g. "codex/o3", "opencode/anthropic/...").
created_at float Unix timestamp when the override was set.
expires_at float \| None Unix timestamp when the override expires; null means "until cleared".
source str Free-form tag indicating who set the override (e.g. "ace").

Writes are atomic (temp file + os.replace). Reads are best-effort self-cleaning: an expired or unparseable file is deleted on next access, so a forgotten override never lingers past its expires_at, even with no TUI running.

Model Resolution

The user-supplied raw_model is normalized through the same rules as %model:

  • provider/model selects the provider explicitly (e.g. codex/o3 or opencode/anthropic/claude-sonnet-4-5).
  • A bare known model name infers its provider from plugin metadata (e.g. sonnet → claude).
  • An unknown bare model is accepted and runs on the current default provider, matching %model behavior.

Duration Parsing

Durations accept compact unit suffixes: 15m, 1h, 1h30m, 90m, 2h15m30s. Bare integers are interpreted as minutes (45 → 45 minutes). The case-insensitive sentinel until cleared (or until_cleared) means "no expiry — persists until the user clears it from the TUI or another sase process clears the state file."

Public API

The override primitives live in src/sase/llm_provider/temporary_override.py:

Function Purpose
get_active_temporary_override(now=None) Read the active override (auto-deletes expired/malformed files).
set_temporary_override(raw, dur, source=) Write a new override, replacing any existing one.
clear_temporary_override() Remove the override file. Safe to call when nothing is active.
parse_override_duration(value) Parse a user-facing duration string into seconds (or None).
resolve_effective_default_provider_model() Centralized helper used by metadata pre-resolution paths.

Examples

  • ACE chord ,P, pick codex/o3, duration 1h~/.sase/llm_override.json is written; new launches default to CODEX(o3) for the next hour.
  • ACE chord ,P, pick opencode/anthropic/claude-sonnet-4-5, duration 1h → new launches default to OPENCODE(anthropic/claude-sonnet-4-5).
  • ACE chord ,P, pick sonnet, duration 30m → known bare model; provider resolves to claude via plugin metadata.
  • ACE chord ,P, choose Clear override~/.sase/llm_override.json is removed; defaults revert to permanent config / autodetect.

Environment Variables

Complete reference of environment variables used by the LLM provider layer.

Generic (Provider-Agnostic)

Variable Description
SASE_LLM_LARGE_ARGS Extra CLI args for large tier invocations
SASE_LLM_SMALL_ARGS Extra CLI args for small tier invocations
SASE_MODEL_TIER_OVERRIDE Force all invocations to a specific model tier
SASE_MODEL_SIZE_OVERRIDE Legacy alias for SASE_MODEL_TIER_OVERRIDE

Claude-Specific

Variable Description
SASE_CLAUDE_LARGE_ARGS Claude-specific extra args for large tier
SASE_CLAUDE_SMALL_ARGS Claude-specific extra args for small tier

Codex-Specific

Variable Description
SASE_CODEX_LARGE_ARGS Codex-specific extra args for large tier
SASE_CODEX_SMALL_ARGS Codex-specific extra args for small tier
SASE_AGENT_PLAN_MODE Enable Codex two-phase plan/implement flow

Qwen-Specific

Variable Description
SASE_QWEN_PATH Path to the Qwen Code CLI binary
SASE_QWEN_LARGE_ARGS Qwen-specific extra args for large tier
SASE_QWEN_SMALL_ARGS Qwen-specific extra args for small tier

Gemini-Specific

Variable Description
SASE_GEMINI_PATH Path to the Gemini CLI binary (default: "gemini").

External provider plugins document their own environment variables in their respective repos.

VCS Provider

Variable Description
SASE_VCS_PROVIDER Override VCS provider ("git", "hg", or "auto")

CLI Flags

ace

Flag Values Description
-m, --model-tier large, small Override model tier for all LLM invocations
--model-size big, little Deprecated alias for --model-tier
--vcs-provider git, hg, auto Override VCS provider

axe

Flag Values Description
--vcs-provider git, hg, auto Override VCS provider

The ace command wires --model-tier / --model-size into the model_tier_override parameter of AceApp. The --vcs-provider flag is wired to the SASE_VCS_PROVIDER environment variable for downstream resolution.

Retry and Fallback

The LLM provider layer supports per-provider retry and fallback configuration. When an agent encounters a retryable error, it can automatically wait and retry, then optionally fall back to an alternate model.

Configuration

Retry behavior is configured per provider under llm_provider.retry in sase.yml:

llm_provider:
  retry:
    gemini:
      max_retries: 3
      error_patterns:
        - "An unexpected critical error occurred:"
      wait_times: [60, 300, 1800]
      fallback_model: "gemini-3-flash-preview"

Config Fields

Field Type Default Description
max_retries int 0 Maximum retry attempts. 0 disables retrying.
error_patterns list[str] [] Case-insensitive substring patterns matched against error output.
wait_times list[int] [30] Per-retry wait times in seconds. Last value reused if list is too short.
fallback_model str|null null Alternate model to use after exhausting all retries.
continuation_prompt str "" Text prepended to state.current_prompt on every retry (used to nudge the agent).
spawn_new_agent bool false Opt in to spawn-on-retry: a retryable error spawns a fresh detached child agent (as if sase run -d had been invoked) instead of in-process retry. See Spawn-on-Retry below.

Default Configuration

Gemini and Claude have retry defaults (defined in default_config.yml); external provider plugins may declare their own via the llm_default_retry_config() hook.

Gemini:

  • max_retries: 3
  • error_patterns: ["An unexpected critical error occurred:"]
  • wait_times: [60, 300, 1800] (1 min, 5 min, 30 min)
  • fallback_model: "gemini-3-flash-preview"

Claude:

  • max_retries: 3
  • error_patterns: ["API Error: 500", "API Error: 529", "Internal server error", "overloaded_error"]
  • wait_times: [60, 300, 1800] (1 min, 5 min, 30 min)
  • fallback_model: "sonnet"

Built-In "Prompt is too long" Recovery (Claude)

Claude has an additional built-in retry entry registered internally (not in default_config.yml) that auto-recovers agents from context-overflow errors without any user config:

  • error pattern: "Prompt is too long"
  • max_retries: 3
  • wait_times: [0] — zero-delay retry so a fresh session restarts immediately
  • continuation_prompt: A short nudge that tells the coder to inspect git status / git diff before resuming, since prior edits are preserved on disk when the retry wipes only the in-memory context

User-supplied llm_provider.retry.claude config is merged on top of these built-ins: explicit falsy values (max_retries: 0 to opt out entirely, continuation_prompt: "" to disable the nudge) override the built-in via key-presence checks. error_patterns is a de-duplicated union of built-in and user lists.

On every retry attempt the continuation_prompt (if non-empty) is idempotently prepended to state.current_prompt before the next invocation — the prepend is gated on a startswith check so repeated retries don't stack duplicate nudges. Workspaces are preserved across built-in context-overflow retries (no workspace wipe), so on-disk edits remain available to the restarted session.

Retry Flow

Error detected
│
├── Does error match error_patterns? (case-insensitive substring)
│   ├── No  → fail immediately
│   └── Yes → retry_count < max_retries?
│       ├── Yes → wait (wait_times[retry_count]) → retry
│       └── No  → fallback_model configured?
│           ├── Yes → switch model via SASE_MODEL_OVERRIDE → retry once
│           └── No  → fail

Wait periods are interruptible — if the agent is killed during a wait, it stops immediately.

TUI Display

The ACE Agents tab reflects retry state (see Retry/Fallback Display):

  • RETRYING (Ns) — Waiting before the next attempt (bold orange, with countdown)
  • ↻N — Retry count annotation on running agents
  • ▸Model — Fallback model annotation (e.g., ↻3▸flash)

Metadata Tracking

After execution completes, retry metadata is written to done.json in the agent's artifacts directory:

{
  "retry_count": 2,
  "retry_errors": ["An unexpected critical error occurred: ..."],
  "used_fallback": false
}

Source: src/sase/llm_provider/retry_config.py, src/sase/axe/run_agent_exec.py

Spawn-on-Retry

When ProviderRetryConfig.spawn_new_agent=True, a retryable error spawns a fresh detached child agent (as if sase run -d had been invoked) instead of running the next attempt in-process. The failing parent transfers its workspace claim to the child via transfer_workspace_claim() and exits with status FAILED (RETRIED). This trades the small cost of a fresh process for two benefits:

  • The workspace is preserved by design — the child skips prepare_workspace() and inherits the parent's in-progress edits via the transferred workspace claim. (Legacy in-process retry runs prepare_workspace() between attempts and wipes uncommitted file edits unless preserve_workspace=True.)
  • A retry boundary becomes a real process boundary, which is more robust against memory leaks, lingering child processes, and stale interpreter state.

Linkage fields (written to both agent_meta.json and done.json so retry chains are queryable from either side):

Field Meaning
retry_of_timestamp Backward link: the parent agent's run timestamp.
retried_as_timestamp Forward link: the child agent's run timestamp (written on the parent at handoff).
retry_chain_root_timestamp The root agent's timestamp — stable across the entire chain.
retry_attempt Depth in the chain (1-based).

State is carried across the boundary by a retry_handoff.json file written to the parent's artifacts directory; the child reads it before launch.

Fallback behavior: spawn-on-retry is opt-in (default false). If spawning fails (e.g. workspace transfer fails), the legacy in-process retry runs as a fallback so the user is never worse off.

Source: src/sase/axe/run_agent_retry_spawn.py, src/sase/llm_provider/retry_config.py

Token Usage Tracking

The LLM provider layer tracks token usage for Claude Code agent runs. Input tokens, output tokens, and cache-read tokens are extracted from the Claude Code stream-json result events and persisted as a usage.json artifact in the agent run directory.

Artifact Format

{
  "input_tokens": 12345,
  "output_tokens": 6789,
  "cache_read_tokens": 3456
}

When telemetry is enabled, token counts are also recorded as Prometheus counters (sase_llm_input_tokens_total, sase_llm_output_tokens_total, sase_llm_cache_read_tokens_total) for monitoring and dashboards. See docs/telemetry.md for the full telemetry reference.

Source: src/sase/llm_provider/_subprocess.py, src/sase/llm_provider/types.py

Prompt Preprocessing Pipeline

Before any prompt reaches a provider, it passes through a 6-step preprocessing pipeline defined in preprocessing.py.

Steps

# Step Syntax Description
1 xprompt references #name Expand reusable inline prompt snippets from xprompts
2 Command substitution $(cmd) Execute shell commands and inline their output
3 File references @path Inline file contents (copy absolute/tilde paths)
4 Jinja2 rendering {{ var }} Render Jinja2 templates after all prior expansions
5 Prettier formatting - Format with prettier for consistent markdown
6 Comment stripping <!-- ... --> Remove HTML/markdown comments

Order Matters

The pipeline runs in strict order. Jinja2 rendering (step 4) happens after xprompt, command substitution, and file reference expansion, so templates can reference content injected by earlier steps.

Home Mode

When is_home_mode=True, file reference processing skips copying files (step 3). This is used when the invocation doesn't need side effects from @path references.

Source Functions

The preprocessing steps delegate to functions from two libraries:

  • xprompt: process_xprompt_references(), is_jinja2_template(), render_toplevel_jinja2()
  • gemini_wrapper.file_references: process_command_substitution(), process_file_references(), format_with_prettier(), strip_html_comments()

Subprocess Streaming

Providers use shared helpers in _subprocess.py to stream LLM output in real-time.

Mechanism

  1. The provider spawns the CLI tool via subprocess.Popen with stdout=PIPE and stderr=PIPE; providers that consume prompts from stdin also set stdin=PIPE.
  2. The prompt is supplied using the provider's documented transport, either stdin or an argv message argument.
  3. Both stdout and stderr file descriptors are set to non-blocking mode via os.set_blocking().
  4. A select.select() loop with a 0.1s timeout polls for readable data on both streams.
  5. Lines are read and optionally printed to the console in real-time.
  6. After the process exits (process.poll() is not None), any remaining buffered output is drained.
  7. The function returns (stdout_content, stderr_content, return_code).

Live Reply File

When SASE_ARTIFACTS_DIR is set, the streaming output is also written in real-time to <SASE_ARTIFACTS_DIR>/live_reply.md. This file is used by the ACE TUI Agents tab to display the agent's reply as it streams in, and remains available after execution completes for the metadata panel's AGENT REPLY section.

Output Suppression

When suppress_output=True, lines are still captured but not printed to the console. This is used for background invocations where the caller only needs the final result.

Postprocessing

After a provider returns (or raises an error), the orchestration layer runs postprocessing steps.

On Success (postprocess_success)

  1. Audio notification: Plays a sound via run_bam_command("Agent reply received") (skipped if suppress_output).
  2. Log to sase.md: Appends a timestamped entry with the prompt and response to <artifacts_dir>/sase.md (if artifacts_dir is set).
  3. Save chat history: Writes to ~/.sase/chats/ if workflow is set. See Chat History.

On Error (postprocess_error)

  1. Rich error display: Prints the prompt and error via print_prompt_and_response() with an _ERROR suffix on the agent type label (skipped if suppress_output).
  2. Log to sase.md: Same as success, but the response is the error message and the agent type gets an _ERROR suffix.
  3. Save error chat history: Writes to ~/.sase/chats/ with an _ERROR agent suffix.

sase.md Log Format

Each entry in the log file follows this format:

## <timestamp> - <agent_type> - iteration <N> - tag <workflow_tag>

### PROMPT:

\`\`\` <prompt text> \`\`\`

### RESPONSE:

\`\`\` <response text> \`\`\`

---

Prompt File Saving

Before invocation, the preprocessed prompt is saved to <artifacts_dir>/<agent_type>_prompt.md (or <agent_type>_iter_<N>_prompt.md if an iteration number is set). This allows reviewing the exact prompt that was sent.

Chat History

Chat histories are stored as markdown files in ~/.sase/chats/.

File Naming

<branch_or_workspace>-<workflow>-[<agent>-]<timestamp>.md
Part Source Example
branch_or_workspace Output of branch_or_workspace_name my_feature
workflow Workflow name, normalized crs, run
agent Agent type (omitted if same as workflow) editor, planner
timestamp YYmmdd_HHMMSS format 260214_153042

Dashes and slashes in workflow names are normalized to underscores.

File Format

# Chat History - <workflow> (<agent>)

**Timestamp:** <display_timestamp>

## Previous Conversation

<previous history if resuming>

---

## Prompt

<prompt text>

## Response

<response text>

Resume Support

The sase run --resume flag resumes a previous conversation by agent name. The #resume workflow resolves the agent name to its artifacts directory, extracts the response path from done.json, and delegates to #resume_by_chat which loads the chat history and prepends it to the new conversation. The --resume flag also accepts a history file basename or full path for direct chat-file-based resumption via the #resume_by_chat workflow.

Resume expansion is recursive: if the loaded chat history itself contains #resume or #resume_by_chat references, those are expanded inline as well. Cycle detection prevents infinite loops when chat histories reference each other.

Invocation Lifecycle

The invoke_agent() function in _invoke.py orchestrates the complete lifecycle of an LLM invocation. Here is the end-to-end flow:

invoke_agent(prompt, agent_type, model_tier, ...)
│
├── 1. Handle deprecated model_size → model_tier mapping
├── 2. Check SASE_MODEL_TIER_OVERRIDE / SASE_MODEL_SIZE_OVERRIDE env vars
├── 3. Build LoggingContext from parameters
│
├── 4. Preprocess prompt (6-step pipeline)
│   ├── xprompt references (#name)
│   ├── Command substitution ($(cmd))
│   ├── File references (@path)
│   ├── Jinja2 rendering ({{ var }})
│   ├── Prettier formatting
│   └── Comment stripping
│
├── 5. Display decision counts (if not suppressed)
├── 6. Print prompt via Rich (if not suppressed)
├── 7. Generate or use provided timestamp
├── 8. Save prompt to artifacts directory
│
├── 9. Get provider from registry and invoke
│   ├── Build CLI command with flags
│   ├── Spawn subprocess (Popen)
│   ├── Supply prompt via provider transport
│   └── Stream stdout/stderr in real-time
│
├── 10. Postprocess
│   ├── Success path:
│   │   ├── Audio notification
│   │   ├── Log to sase.md
│   │   └── Save chat history
│   └── Error path:
│       ├── Rich error display
│       ├── Log error to sase.md
│       └── Save error chat history
│
└── 11. Return AIMessage(content=response)

Parameters

Parameter Type Default Description
prompt str (required) Raw prompt to send
agent_type str (required) Agent type label (e.g., "editor")
model_tier ModelTier "large" Model tier to use
model_size "big" \| "little" \| None None Deprecated, use model_tier
iteration int \| None None Iteration number for logging
workflow_tag str \| None None Workflow tag for logging
artifacts_dir str \| None None Directory for sase.md and prompt files
workflow str \| None None Workflow name for chat history
suppress_output bool False Suppress console output
timestamp str \| None None Shared timestamp (YYmmdd_HHMMSS)
is_home_mode bool False Skip file copying for @ references
decision_counts dict[str, Any] \| None None Planning agent decision counts
provider_name str \| None None Override provider (default from config)

Return Value

Always returns an AIMessage (from langchain_core.messages). On error, the content field contains the error message rather than a response.