Documentation Index Fetch the complete documentation index at: https://mintlify.com/iii-hq/agentos/llms.txt
Use this file to discover all available pages before exploring further.
AgentOS provides access to 47 models across 5 performance tiers, spanning frontier reasoning models to cost-effective local options.
Model Tiers
Frontier
Smart
Balanced
Fast
Local
Most capable models for complex reasoning, research, and advanced tasks Model Provider Context Price (per 1M tokens) Features claude-opus-4-6Anthropic 200K 15 / 15 / 15/ 75Tools, Vision, 32K output o3OpenAI 200K 10 / 10 / 10/ 40Reasoning, 100K output gemini-2.5-proGoogle 1M 1.25 / 1.25 / 1.25/ 10Vision, Code exec, 65K output grok-3xAI 131K 3 / 3 / 3/ 15Tools, Vision samba-llama-3.1-405bSambaNova 4K 5 / 5 / 5/ 10405B parameters
Advanced capabilities at lower cost - ideal for most agent tasks Model Provider Context Price (per 1M tokens) Features claude-sonnet-4-6Anthropic 200K 3 / 3 / 3/ 15Tools, Vision, 16K output gpt-4oOpenAI 128K 2.5 / 2.5 / 2.5/ 10Tools, Vision, JSON mode gpt-4.1OpenAI 1M 2 / 2 / 2/ 81M context, Vision deepseek-reasonerDeepSeek 128K 0.55 / 0.55 / 0.55/ 2.19R1 reasoning grok-2xAI 131K 2 / 2 / 2/ 10Tools, Vision mistral-largeMistral 128K 2 / 2 / 2/ 6Tools, EU hosting sonar-proPerplexity 200K 3 / 3 / 3/ 15Search-augmented command-aCohere 256K 2.5 / 2.5 / 2.5/ 10RAG-optimized command-r-plusCohere 128K 3 / 3 / 3/ 15Tools, Grounding jamba-1.5-largeAI21 256K 2 / 2 / 2/ 8256K context qwen-maxQwen 32K 2.4 / 2.4 / 2.4/ 9.6Multilingual glm-4-plusZhipu 128K 7 / 7 / 7/ 7Chinese/English moonshot-v1-128kMoonshot 128K 8.5 / 8.5 / 8.5/ 8.5128K context ernie-4.0-turboBaidu 128K 4.2 / 4.2 / 4.2/ 8.4Chinese language bedrock-claude-sonnetAWS Bedrock 200K 3 / 3 / 3/ 15AWS integration copilot-gpt-4oGitHub 128K 2.5 / 2.5 / 2.5/ 10Code-optimized openrouter-autoOpenRouter 128K Dynamic Multi-provider routing
Cost-effective general purpose models Model Provider Context Price (per 1M tokens) Features o4-miniOpenAI 200K 1.1 / 1.1 / 1.1/ 4.4Fast reasoning deepseek-chatDeepSeek 128K 0.14 / 0.14 / 0.14/ 0.28Best value llama-3.3-70bGroq 131K 0.59 / 0.59 / 0.59/ 0.79Fast inference sonarPerplexity 200K 1 / 1 / 1/ 1Search command-rCohere 128K 0.5 / 0.5 / 0.5/ 1.5RAG cerebras-llama-3.3-70bCerebras 8K 0.6 / 0.6 / 0.6/ 0.6Ultra-fast samba-llama-3.3-70bSambaNova 4K 0.6 / 0.6 / 0.6/ 0.6Enterprise hf-llama-3.3-70bHuggingFace 128K 0.36 / 0.36 / 0.36/ 0.36Free tier replicate-llama-3.3-70bReplicate 128K 0.65 / 0.65 / 0.65/ 2.75Custom deploy qwen-plusQwen 128K 0.5 / 0.5 / 0.5/ 1.5Multilingual abab7-chatMiniMax 245K 1 / 1 / 1/ 1245K context glm-4Zhipu 128K 1.4 / 1.4 / 1.4/ 1.4Chinese/English moonshot-v1-32kMoonshot 32K 3.3 / 3.3 / 3.3/ 3.3Chinese ernie-3.5-turboBaidu 128K 0.56 / 0.56 / 0.56/ 1.12Chinese bedrock-nova-proAWS Bedrock 300K 0.8 / 0.8 / 0.8/ 3.2Amazon Nova bedrock-llama-3.3-70bAWS Bedrock 128K 0.72 / 0.72 / 0.72/ 0.72AWS Llama together-llama-3.3-70bTogether 131K 0.88 / 0.88 / 0.88/ 0.88Open models fireworks-llama-3.3-70bFireworks 131K 0.9 / 0.9 / 0.9/ 0.9Fast hosting
Optimized for speed and low latency Model Provider Context Price (per 1M tokens) Features claude-haiku-4-5Anthropic 200K 0.8 / 0.8 / 0.8/ 4Tools, Vision, Fast gpt-4o-miniOpenAI 128K 0.15 / 0.15 / 0.15/ 0.6Best price/performance gemini-2.5-flashGoogle 1M 0.15 / 0.15 / 0.15/ 0.6Ultra-fast, 1M context grok-3-minixAI 131K 0.3 / 0.3 / 0.3/ 0.5Fast responses jamba-1.5-miniAI21 256K 0.2 / 0.2 / 0.2/ 0.4256K context qwen-turboQwen 1M 0.05 / 0.05 / 0.05/ 0.151M context, lowest cost hf-mistral-7bHuggingFace 32K Free Free tier
Self-hosted models with zero API costs All local models are free but require local compute resources. Ollama Models Run any model from ollama.ai/library :
llama3.3 - Meta Llama 3.3 70B
qwen2.5 - Qwen 2.5 72B
deepseek-r1 - DeepSeek R1 reasoning
mistral - Mistral 7B/Mixtral
phi4 - Microsoft Phi-4
gemma2 - Google Gemma 2
codestral - Mistral code specialist
vLLM Production-grade self-hosting with GPU optimization LM Studio Desktop GUI for running models locally
Model Aliases
AgentOS provides convenient aliases for quick model selection:
const ALIASES = {
// Tier shortcuts
"best" : "claude-opus-4-6" ,
"frontier" : "claude-opus-4-6" ,
"smart" : "claude-sonnet-4-6" ,
"fast" : "claude-haiku-4-5" ,
"cheap" : "gpt-4o-mini" ,
// Claude shortcuts
"opus" : "claude-opus-4-6" ,
"sonnet" : "claude-sonnet-4-6" ,
"haiku" : "claude-haiku-4-5" ,
// OpenAI shortcuts
"gpt4" : "gpt-4o" ,
"gpt4o" : "gpt-4o" ,
"gpt41" : "gpt-4.1" ,
"o3" : "o3" ,
"o4" : "o4-mini" ,
// Google shortcuts
"flash" : "gemini-2.5-flash" ,
"pro" : "gemini-2.5-pro" ,
"gemini" : "gemini-2.5-flash" ,
// Other providers
"deepseek" : "deepseek-chat" ,
"ds" : "deepseek-chat" ,
"r1" : "deepseek-reasoner" ,
"llama" : "llama-3.3-70b" ,
"grok" : "grok-2" ,
"grok3" : "grok-3" ,
"mistral" : "mistral-large" ,
"sonar" : "sonar-pro" ,
"command" : "command-a" ,
"jamba" : "jamba-1.5-large" ,
"qwen" : "qwen-max" ,
"glm" : "glm-4-plus" ,
"kimi" : "moonshot-v1-128k" ,
"ernie" : "ernie-4.0-turbo" ,
"bedrock" : "bedrock-claude-sonnet" ,
"nova" : "bedrock-nova-pro" ,
"copilot" : "copilot-gpt-4o" ,
}
Using Aliases
# CLI usage
agentos message default "Hello" --model sonnet
agentos models describe haiku
# Or via API
const result = await trigger ( 'catalog::resolve' , { model: 'opus' } );
// Returns full model entry for claude-opus-4-6
Model Capabilities
Models that support structured tool invocation:
43 models with tool support
Vision Support
Models that can process images:
Claude Opus 4.6, Sonnet 4.6, Haiku 4.5
GPT-4o, GPT-4.1, o3, o4-mini, GPT-4o mini
Gemini 2.5 Pro, Gemini 2.5 Flash
Grok-2, Grok-3
AWS Bedrock Claude, Nova Pro
GitHub Copilot GPT-4o
Long Context (>100K tokens)
Models with extended context windows:
30+ models with 100K+ context
1M tokens : Gemini 2.5 (Flash, Pro), GPT-4.1, Qwen Turbo
300K tokens : AWS Bedrock Nova Pro
256K tokens : Cohere Command series, AI21 Jamba, MiniMax ABAB 7
200K tokens : All Claude models, OpenAI o3/o4-mini, Perplexity Sonar, Bedrock Claude
128K+ : Most other modern models
Pricing Comparison
Prices shown as input / output per 1 million tokens (USD).
Best Value Models
Model Input Output Use Case qwen-turbo$0.05 $0.15 High-volume, 1M context deepseek-chat$0.14 $0.28 General purpose gpt-4o-mini$0.15 $0.6 Fast OpenAI option gemini-2.5-flash$0.15 $0.6 Google ecosystem jamba-1.5-mini$0.2 $0.4 256K context hf-llama-3.3-70b$0.36 $0.36 Open source hf-mistral-7bFree Free Development/testing
Most Expensive (Frontier)
Model Input Output Justification claude-opus-4-6$15 $75 Most capable reasoning o3$10 $40 Advanced reasoning gemini-2.5-pro$1.25 $10 1M context + code exec samba-llama-3.1-405b$5 $10 405B parameters
Model Selection Guide
By Use Case
Simple Chat
Agent Tasks
Complex Reasoning
Code Generation
RAG/Search
Chinese Content
High Volume
Privacy-Critical
// Use fast tier
model : "claude-haiku-4-5" // or "gpt-4o-mini" or "gemini-2.5-flash"
By Budget
Under $0.50/1M tokens : Qwen Turbo, DeepSeek Chat, HF models, local (free)
0.50 − 0.50- 0.50 − 2/1M : Most balanced tier models
2 − 2- 2 − 5/1M : Smart tier models
Over $5/1M : Frontier models (use sparingly)
CLI Usage
# List all models
agentos models list
# Filter by tier
agentos models list --tier smart
# Filter by provider
agentos models list --provider anthropic
# Filter by capability
agentos models list --tools
agentos models list --vision
# Describe a model
agentos models describe claude-sonnet-4-6
# Output:
# Model: claude-sonnet-4-6
# Provider: Anthropic
# Tier: smart
# Context: 200,000 tokens
# Max Output: 16,000 tokens
# Pricing: $3.00 / $15.00 per 1M tokens
# Capabilities: tools, vision
Programmatic Access
import { trigger } from 'iii-sdk' ;
// List all models
const models = await trigger ( 'catalog::models' , {});
// Filter by tier
const smartModels = await trigger ( 'catalog::models' , { tier: 'smart' });
// Filter by provider
const anthropicModels = await trigger ( 'catalog::models' , { provider: 'anthropic' });
// Resolve model or alias
const model = await trigger ( 'catalog::resolve' , { model: 'sonnet' });
// Returns full ModelEntry with pricing, capabilities, etc.
// List providers
const providers = await trigger ( 'catalog::providers' , {});
// Returns provider configs with availability status
// Get aliases
const aliases = await trigger ( 'catalog::aliases' , {});
HTTP API
# List models
curl http://localhost:3111/api/models
# List providers
curl http://localhost:3111/api/providers
# Get aliases
curl http://localhost:3111/api/models/aliases
# Test provider
curl -X POST http://localhost:3111/api/providers/anthropic/test
Each model entry includes:
interface ModelEntry {
id : string ; // Model identifier
provider : string ; // Provider name
name : string ; // Display name
tier : "frontier" | "smart" | "balanced" | "fast" | "local" ;
contextWindow : number ; // Max input tokens
maxOutput : number ; // Max output tokens
inputPrice : number ; // $ per 1M input tokens
outputPrice : number ; // $ per 1M output tokens
supportsTools : boolean ; // Function calling
supportsVision : boolean ; // Image input
local : boolean ; // Self-hosted
}
Next Steps
Provider Details Learn about each provider’s setup
Routing Logic Understand automatic model selection