MODELS

Model tiers for tool-heavy agent use: quality, cost/service, and local Ollama.

06 / 16·guide·v0.9.26

alpi works with any model that speaks the OpenAI tool-calling protocol via LiteLLM, but not every model is a good agent. The important question is not "what scores highest on a benchmark?" but "what model keeps choosing the right tool after 20 turns, with memory, skills, shell commands, browser calls, and user-specific state in context?"

Use this page as a practical selector. Prices, context windows, and provider wrappers move quickly; re-check them every 2-3 months.

Last updated: 2026-06-20.

What matters for alpi

For skill-heavy profiles, model quality mostly shows up in routing:

A cheap model can be excellent for status checks and short commands while still being a bad primary model for a profile with many skills.

External usage signal

Public provider dashboards can be useful when they show model use in tool-heavy agent workloads rather than chat-only benchmarks. Treat that data as a weak signal, not a ranking: defaults, price, rate limits, regional availability, and provider wrappers all bias usage.

Common high-usage models in tool-heavy agent workloads currently include owl-alpha, MiMo V2.5 Pro / V2.5, DeepSeek V4 Pro / Flash, MiniMax M3, Claude Sonnet 4.6 / Opus 4.8, Nemotron 3 Super, and OpenAI GPT-5.5 / 5.4-mini.

Pick by workload

The tables below use OpenRouter routes as the primary ID — a single OPENROUTER_API_KEY covers every entry, and several picks (owl-alpha, MiMo, DeepSeek, MiniMax, Nemotron) only ship via OpenRouter. For Anthropic and OpenAI you can also use native routes if you have those provider keys; the convention is shown under the tables.

Skill router / long tool chains

Use these when the profile has many skills, persistent memory, stateful tools, or real side effects. This is the default category for daily interactive alpi use.

ModelOpenRouter IDWhy
owl-alphaowl-alphaMost-used OpenRouter model for tool-heavy workloads; 1M context, alpha channel — fast iteration, expect occasional wrapper churn.
DeepSeek V4 Prodeepseek/deepseek-v4-proStrong tool discipline at 1M context; sensible flagship-class daily driver.
MiMo V2.5 Proxiaomi/mimo-v2.5-proStrong adoption in persistent-agent workloads; 1M context, good price-for-quality.
MiniMax M3minimax/minimax-m3Mid-tier agent model; 512K context, decent for persistent sessions.
Claude Sonnet 4.6anthropic/claude-sonnet-4.6Premium daily driver; strongest tool discipline and coding judgement at this tier.

If you can only choose one model for a skill-heavy profile, start with owl-alpha, DeepSeek V4 Pro, MiMo V2.5 Pro, or Sonnet 4.6 depending on budget and provider preference.

Cheap service turns

Use these for Telegram gateway traffic, heartbeats, summaries, simple lookups, and low-risk commands. They are not the first choice for creating or debugging skills.

ModelOpenRouter IDWhy
DeepSeek V4 Flashdeepseek/deepseek-v4-flash1M context at the cheap-fast tier; the headroom is useful even for short turns.
MiMo V2.5xiaomi/mimo-v2.5Budget sibling to MiMo V2.5 Pro; 1M context, useful for A/B testing cheap service profiles.
Claude Haiku 4.5anthropic/claude-haiku-4.5Cheap and fast with reasoning support; reliable for short-chain turns.
GPT-5.4 Miniopenai/gpt-5.4-miniReasonable OpenAI budget pick for simple tool use; acceptable as a router only when the skill catalog is clean and small.
GPT-5.4 Nanoopenai/gpt-5.4-nanoCheapest OpenAI tier; mechanical turns only.

High-stakes engineering

Use these when a wrong tool call is expensive: refactors, code review, long debugging sessions, schema changes, release work.

ModelOpenRouter IDWhy
Claude Opus 4.8anthropic/claude-opus-4.8Flagship — expensive ceiling for hard multi-step engineering and long-context judgement.
Claude Sonnet 4.6anthropic/claude-sonnet-4.6Best daily premium balance for coding-heavy profiles.
GPT-5.5openai/gpt-5.5OpenAI flagship; strong general engineering.
GPT-5.5 Proopenai/gpt-5.5-proExtended-reasoning variant for the hardest multi-step tasks.
o3openai/o3Heavy-reasoning specialist; not a router, use for one-shot analysis.
GPT-5.3 Codexopenai/gpt-5.3-codexCoding-specific; good for repo-tooling profiles.
Nemotron 3 Supernvidia/nemotron-3-super-120b-a12bOpen-weight engineering option; 256K context.

Local / sovereign profiles

"Best" here means best inside the local model ecosystem (Ollama, llama.cpp, vLLM), not best overall. The curated catalog does not pin specific local IDs because the field moves fast and the right pick depends on your VRAM budget. Choose from current Qwen-coder, Gemma, codestral, or Mistral families. Expect more prompt sensitivity than cloud frontier models and tighter context windows.

For privacy-constrained work that needs more headroom than your hardware allows, consider an open-weight cloud model like nvidia/nemotron-3-super-120b-a12b (256K) or deepseek/deepseek-v4-pro (1M) — not local, but no proprietary frontier dependency.

Native routes for Anthropic and OpenAI

If you have ANTHROPIC_API_KEY or OPENAI_API_KEY, you can use native routes instead of the OpenRouter aliases. Native usually means lower latency and one less layer to break.

ProviderOpenRouter routeNative route
Anthropicanthropic/claude-opus-4.8claude-opus-4-8 (hyphens, not dots)
Anthropicanthropic/claude-sonnet-4.6claude-sonnet-4-6
Anthropicanthropic/claude-haiku-4.5claude-haiku-4-5
OpenAIopenai/gpt-5.5gpt-5.5 (no prefix)
OpenAIopenai/gpt-5.5-progpt-5.5-pro
OpenAIopenai/gpt-5.4-minigpt-5.4-mini
OpenAIopenai/gpt-5.4-nanogpt-5.4-nano
OpenAIopenai/gpt-5.3-codexgpt-5.3-codex
OpenAIopenai/o3o3

These can still be useful as workers, but they should not be the main model for a profile that depends on skills:

Production setups

alpi currently selects one primary model per profile. fallback_models exists in config, but do not rely on automatic runtime escalation unless your installed version explicitly implements it. Use profiles to split roles today:

Switching model

Three ways, any of them works:

The choice is per-profile. alpi -p work can run Sonnet 4.6 while alpi -p personal runs MiMo V2.5 without interference.

theme