Models & Providers

How agent-afk routes model names to providers, and how to use Anthropic, OpenAI, and local OpenAI-compatible shims.

agent-afk speaks to two provider families through a single abstraction in src/agent/providers/. The routing happens automatically based on the model name — no global AFK_PROVIDER needed in the common case.

Provider families

anthropic-direct (default)

Wraps the @anthropic-ai/sdk Messages API. Selected automatically for:

claude-* model IDs
Identity aliases (fixed): opus/opus_1m, sonnet/sonnet_1m, haiku, fable
'anthropic' (silent alias)

openai-compatible

Talks directly to the OpenAI Chat Completions API (or any compatible endpoint via AFK_OPENAI_BASE_URL). Selected automatically for:

gpt-*, o1*, o3*, o4*, codex-*
deepseek-*, mistral-*, mixtral-*, llama-*, qwen-* — common third-party shim families (OpenRouter, Together, Fireworks, DeepSeek, etc.)
HuggingFace-style org/model IDs (e.g. mlx-community/Qwen3-32B-4bit, Qwen/...) served by local OpenAI-shim runners — any ID containing /
'openai-codex' — deprecated alias from before 2026-05-18; the underlying @openai/codex-sdk harness has been removed and this now resolves to the same openai-compatible provider. Still accepted for back-compat; will be removed in a future major release.

chatgpt-oauth (per-slot)

Not a name-based routing rule like the two families above — an explicit per-slot provider override set in afk.config.json's models block (see Model Slots). A tier bound provider: "chatgpt-oauth" (or the "chatgpt" shorthand) still routes to openai-compatible, but additionally forces the ChatGPT-subscription OAuth credential (~/.codex/auth.json, written by codex login — see ChatGPT-subscription OAuth) for that tier only — regardless of OPENAI_API_KEY / CODEX_API_KEY, and without setting the global AFK_OPENAI_CHATGPT_OAUTH flag.

This lets a ChatGPT-subscription model, a separately-keyed OpenAI model, and an Anthropic model coexist in the same session and /model-switch between them cleanly:

{
  "models": {
    // ChatGPT subscription — forces ~/.codex/auth.json for this tier only.
    "medium": { "id": "gpt-5.6", "provider": "chatgpt-oauth" },
    // A separately-keyed OpenAI model, unaffected by the tier above.
    "small": { "id": "qwen3.7-plus", "provider": "openai" }
    // (+ AFK_MODEL_SMALL_API_KEY / AFK_MODEL_SMALL_BASE_URL)
  }
}
// The main model stays Anthropic (sonnet/opus) — all three coexist and
// /model-switch cleanly between them.

Before this, ChatGPT-subscription OAuth was selectable only via the global AFK_OPENAI_CHATGPT_OAUTH flag, which sat at the lowest tier of a single global OpenAI auth chain — an ambient OPENAI_API_KEY blocked the OAuth fall-through entirely, so a subscription model and a custom-keyed OpenAI model could not be used side by side. The global flag is unchanged and still works for a single-model setup.

Selecting a model

Set the default for all sessions:

export AFK_MODEL=sonnet          # or: haiku, opus, fable, gpt-5, etc.

Override for a single call:

afk chat "explain this stack trace" --model opus
afk i --model haiku
afk chat "summarise the staged diff" --model gpt-5

Switch mid-session in the REPL:

/model gpt-5.5       # next turn routes to openai-compatible
/model sonnet        # next turn routes back to anthropic-direct

Mid-session model switches work transparently — cost totals and hooks carry over.

Cross-family history caveat: Anthropic thinking blocks and tool-call ID schemas differ between providers. When you switch provider families mid-session, the new model sees prior turns as plain text, not structured tool calls. Same-family switches keep full fidelity.

Forcing a provider

AFK_PROVIDER (and --provider) force a single provider for the whole session, bypassing the per-model heuristic:

export AFK_PROVIDER=openai-compatible   # all models routed to OpenAI compat

Accepted values: anthropic, anthropic-direct, openai, openai-compatible, openai-codex. The --provider CLI flag wins when both are set.

AFK_PROVIDER is now an escape hatch, not a requirement. Omit it to let the router pick automatically per model.

Available models

Alias	Capability	Provider
`fable`	Most capable — Claude Fable 5 (Mythos-class), 1M context	anthropic-direct
`opus` / `opus_1m`	Complex reasoning, multi-step planning, long contexts	anthropic-direct
`sonnet` / `sonnet_1m`	Balanced — the `medium` tier's default model	anthropic-direct
`haiku`	Fast and cheap, best for simple tasks	anthropic-direct

sonnet_1m and opus_1m resolve to the same fixed model ID as their base alias (sonnet/opus respectively) — like all identity aliases, they are not tiers, so "resolve to the same tier" no longer applies. The 1M context window is handled by the model itself; passing the _1m suffix routes to the same pinned model ID. Source: src/agent/session/model-slots.ts (DIRECT_MODEL_ALIASES).

You can also pass any raw model ID accepted by the provider (e.g. claude-opus-5, gpt-5, claude-haiku-4-5-20251001).

Local OpenAI-compatible shims (MLX, ollama, llama.cpp, vLLM)

Point AFK_OPENAI_BASE_URL at your local server to use any model served over an OpenAI-compatible API:

# mlx_lm.server, ollama (openai mode), vLLM, LM Studio, llama.cpp
export AFK_OPENAI_BASE_URL=http://127.0.0.1:8080/v1
export OPENAI_API_KEY=local          # placeholder; most shims accept any value
export AFK_MODEL=mlx-community/Qwen3-32B-4bit

afk i

The OpenAI SDK appends /chat/completions itself, so do not include that path in AFK_OPENAI_BASE_URL. A value ending in /chat/completions is stripped at config-load time with a one-shot warning.

For different endpoints per capability tier, use per-slot base URLs — see Model Slots.

OpenAI Responses API

The OpenAI-compatible provider uses Chat Completions by default. To opt into the OpenAI Responses API surface instead:

export AFK_OPENAI_USE_RESPONSES=1

The ChatGPT-subscription OAuth path uses Responses automatically regardless of this flag. See API Keys for OAuth details.

Extended thinking and effort

export AFK_THINKING=adaptive      # adaptive | disabled | enabled:<N> | enabled:max
export AFK_EFFORT=medium          # low | medium | high | xhigh | max

AFK_THINKING controls Anthropic extended thinking. AFK_EFFORT is forwarded as an effort hint (model-gated; ignored where unsupported). Both can also be set per-call with --thinking and --effort flags.

OpenAI o-series reasoning effort. For o-series models (o1, o3, o3-mini, o4, and provider/o* IDs), AFK_EFFORT maps to the OpenAI reasoning_effort parameter: low → low, medium → medium, and high/xhigh/max → high. The value is sent on the Chat Completions request (or as reasoning.effort on the Responses API). It is omitted entirely when AFK_EFFORT is unset or the model is not o-series.

Reliability and retries

The openai-compatible provider retries transient failures automatically. Requests that fail with 429, 500, 502, 503, or 529 are retried up to three times with exponential backoff (2s → 4s → 8s); both the initial connection and a dropped stream get their own retry budget. Non-transient statuses (400, 401, 403, 404) are surfaced immediately without retrying.

Output-token caps are threaded into the streaming request body: AFK_MAX_OUTPUT_TOKENS (or the resolved per-model default) is sent as max_tokens, or as max_completion_tokens for o-series models that require it.

Debug: what provider is active?

afk config              # shows resolved model and provider
afk config --format json   # machine-readable, includes raw env vars
afk provider auth diagnose  # targeted credential check per provider

Models & Providers

On this page