Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/iii-hq/agentos/llms.txt

Use this file to discover all available pages before exploring further.

AgentOS supports 25 LLM providers spanning frontier labs, cloud platforms, specialized inference providers, and local deployment options.

Provider Categories

Leading AI research labs with state-of-the-art models:
  • Anthropic: Claude Opus, Sonnet, Haiku
  • OpenAI: GPT-4o, GPT-4.1, o3, o4-mini
  • Google: Gemini 2.5 Flash, Gemini 2.5 Pro
Enterprise cloud AI services:
  • AWS Bedrock: Claude, Nova, Titan, Llama (via AWS)
Optimized for low-latency, high-throughput inference:
  • Groq: Llama 3.3 70B, Mixtral
  • DeepSeek: V3, R1 (reasoning model)
  • Cerebras: Llama 3.3 70B (ultra-fast)
  • SambaNova: Llama 3.1 405B, Llama 3.3 70B
  • Fireworks: Llama 3.3 70B
  • Together: Open-source models
  • vLLM: Self-hosted fast inference
Multi-provider routing and access:
  • OpenRouter: Access 100+ models through one API
  • HuggingFace: Inference API for open models
Domain-specific or regional models:
  • Perplexity: Sonar (search-augmented)
  • Cohere: Command A, Command R+
  • xAI: Grok-2, Grok-3
  • Mistral: Large, Medium, Small
  • Replicate: Open-source model hosting
  • AI21: Jamba 1.5
Leading Chinese language models:
  • Qwen (Alibaba): Qwen Max, Plus, Turbo
  • Minimax: ABAB 7 Chat
  • Zhipu AI: GLM-4, GLM-4 Plus
  • Moonshot: Kimi (128K context)
  • Baidu: ERNIE 4.0, ERNIE 3.5
Self-hosted, privacy-first options:
  • Ollama: Run Llama, Qwen, Mistral locally
  • LM Studio: Desktop GUI for local models
  • vLLM: Production self-hosting

Provider Details

Anthropic

Anthropic Claude

Base URL: https://api.anthropic.com
API Key: ANTHROPIC_API_KEY
Driver: Native Anthropic SDK
Models:
  • claude-opus-4-6 - Frontier reasoning (15/15/75 per 1M tokens)
  • claude-sonnet-4-6 - Smart general purpose (3/3/15 per 1M tokens)
  • claude-haiku-4-5 - Fast responses (0.8/0.8/4 per 1M tokens)
Features: Tool use, vision, 200K context, streaming

OpenAI

OpenAI GPT

Base URL: https://api.openai.com/v1
API Key: OPENAI_API_KEY
Driver: OpenAI-compatible
Models:
  • gpt-4o - Multimodal flagship (2.5/2.5/10 per 1M tokens)
  • gpt-4.1 - 1M context window (2/2/8 per 1M tokens)
  • o3 - Advanced reasoning (10/10/40 per 1M tokens)
  • o4-mini - Fast reasoning (1.1/1.1/4.4 per 1M tokens)
  • gpt-4o-mini - Cost-effective (0.15/0.15/0.6 per 1M tokens)
Features: Tool use, vision, JSON mode, function calling

Google Gemini

Google Gemini

Base URL: https://generativelanguage.googleapis.com
API Key: GEMINI_API_KEY
Driver: Gemini-specific
Models:
  • gemini-2.5-pro - Frontier multimodal (1.25/1.25/10 per 1M tokens)
  • gemini-2.5-flash - Ultra-fast (0.15/0.15/0.6 per 1M tokens)
Features: 1M context, vision, code execution, grounding

AWS Bedrock

AWS Bedrock

Base URL: https://bedrock-runtime.us-east-1.amazonaws.com
API Key: AWS_ACCESS_KEY_ID + AWS_SECRET_ACCESS_KEY
Driver: Bedrock SDK
Models:
  • bedrock-claude-sonnet - Claude Sonnet 4 via AWS
  • bedrock-nova-pro - Amazon Nova Pro (300K context)
  • bedrock-llama-3.3-70b - Llama 3.3 70B
Features: Enterprise compliance, VPC isolation, AWS integration

DeepSeek

DeepSeek

Base URL: https://api.deepseek.com/v1
API Key: DEEPSEEK_API_KEY
Driver: OpenAI-compatible
Models:
  • deepseek-chat - Balanced performance (0.14/0.14/0.28 per 1M tokens)
  • deepseek-reasoner - R1 reasoning model (0.55/0.55/2.19 per 1M tokens)
Features: 128K context, tool use, competitive pricing

Groq

Groq

Base URL: https://api.groq.com/openai/v1
API Key: GROQ_API_KEY
Driver: OpenAI-compatible
Models:
  • llama-3.3-70b - Ultra-fast Llama inference (0.59/0.59/0.79 per 1M tokens)
Features: 131K context, 500+ tok/sec, low latency

Mistral AI

Mistral AI

Base URL: https://api.mistral.ai/v1
API Key: MISTRAL_API_KEY
Driver: OpenAI-compatible
Models:
  • mistral-large - Flagship European model (2/2/6 per 1M tokens)
Features: 128K context, tool use, European data residency

Together AI

Together AI

Base URL: https://api.together.xyz/v1
API Key: TOGETHER_API_KEY
Driver: OpenAI-compatible
Models:
  • together-llama-3.3-70b - Open-source Llama (0.88/0.88/0.88 per 1M tokens)
Features: Open models, custom fine-tuning, fast inference

Fireworks AI

Fireworks AI

Base URL: https://api.fireworks.ai/inference/v1
API Key: FIREWORKS_API_KEY
Driver: OpenAI-compatible
Models:
  • fireworks-llama-3.3-70b - Fast Llama hosting (0.9/0.9/0.9 per 1M tokens)
Features: Sub-second latency, function calling

Cohere

Cohere

Base URL: https://api.cohere.ai/v1
API Key: COHERE_API_KEY
Driver: OpenAI-compatible
Models:
  • command-a - Latest flagship (2.5/2.5/10 per 1M tokens)
  • command-r-plus - RAG-optimized (3/3/15 per 1M tokens)
  • command-r - Balanced (0.5/0.5/1.5 per 1M tokens)
Features: 256K context, RAG, grounded generation

Perplexity

Perplexity AI

Base URL: https://api.perplexity.ai
API Key: PERPLEXITY_API_KEY
Driver: OpenAI-compatible
Models:
  • sonar-pro - Search-augmented answers (3/3/15 per 1M tokens)
  • sonar - Balanced search (1/1/1 per 1M tokens)
Features: Real-time search, citations, 200K context

xAI

xAI Grok

Base URL: https://api.x.ai/v1
API Key: XAI_API_KEY
Driver: OpenAI-compatible
Models:
  • grok-3 - Frontier reasoning (3/3/15 per 1M tokens)
  • grok-2 - Smart general purpose (2/2/10 per 1M tokens)
  • grok-3-mini - Fast responses (0.3/0.3/0.5 per 1M tokens)
Features: 131K context, tool use, X integration

Replicate

Replicate

Base URL: https://api.replicate.com/v1
API Key: REPLICATE_API_TOKEN
Driver: OpenAI-compatible
Models:
  • replicate-llama-3.3-70b - Llama 3.3 70B Instruct
Features: Run any open model, custom deployments

Ollama (Local)

Ollama

Base URL: http://localhost:11434/v1
API Key: Not required
Driver: OpenAI-compatible
Models: Any model from ollama.ai/library
  • llama3.3, qwen2.5, deepseek-r1, mistral, phi4, etc.
Features: Fully local, no API costs, privacy-first, offline

vLLM (Local)

vLLM

Base URL: http://localhost:8000/v1
API Key: Not required
Driver: OpenAI-compatible
Features: Production self-hosting, GPU optimization, PagedAttention

LM Studio (Local)

LM Studio

Base URL: http://localhost:1234/v1
API Key: Not required
Driver: OpenAI-compatible
Features: Desktop GUI, one-click setup, model library

OpenRouter

OpenRouter

Base URL: https://openrouter.ai/api/v1
API Key: OPENROUTER_API_KEY
Driver: OpenAI-compatible
Models:
  • openrouter-auto - Automatic routing across 100+ models
Features: Unified API, cost optimization, fallback routing

HuggingFace

HuggingFace

Base URL: https://api-inference.huggingface.co
API Key: HF_API_KEY
Driver: OpenAI-compatible
Models:
  • hf-llama-3.3-70b - Llama 3.3 70B Instruct
  • hf-mistral-7b - Mistral 7B (free tier)
Features: Free tier, 1000+ models, serverless inference

AI21 Labs

AI21 Labs

Base URL: https://api.ai21.com/studio/v1
API Key: AI21_API_KEY
Driver: OpenAI-compatible
Models:
  • jamba-1.5-large - 256K context (2/2/8 per 1M tokens)
  • jamba-1.5-mini - Fast variant (0.2/0.2/0.4 per 1M tokens)
Features: 256K context, structured outputs

Cerebras

Cerebras

Base URL: https://api.cerebras.ai/v1
API Key: CEREBRAS_API_KEY
Driver: OpenAI-compatible
Models:
  • cerebras-llama-3.3-70b - Ultra-fast Llama (0.6/0.6/0.6 per 1M tokens)
Features: 1800+ tok/sec, wafer-scale engine

SambaNova

SambaNova

Base URL: https://api.sambanova.ai/v1
API Key: SAMBANOVA_API_KEY
Driver: OpenAI-compatible
Models:
  • samba-llama-3.1-405b - Largest Llama (5/5/10 per 1M tokens)
  • samba-llama-3.3-70b - Balanced (0.6/0.6/0.6 per 1M tokens)
Features: Enterprise hardware, tool use

Qwen (Alibaba)

Qwen

Base URL: https://dashscope.aliyuncs.com/compatible-mode/v1
API Key: DASHSCOPE_API_KEY
Driver: OpenAI-compatible
Models:
  • qwen-max - Flagship model (2.4/2.4/9.6 per 1M tokens)
  • qwen-plus - Balanced (0.5/0.5/1.5 per 1M tokens)
  • qwen-turbo - Fast, 1M context (0.05/0.05/0.15 per 1M tokens)
Features: 1M context, multilingual, code generation

MiniMax

MiniMax

Base URL: https://api.minimax.chat/v1
API Key: MINIMAX_API_KEY
Driver: OpenAI-compatible
Models:
  • abab7-chat - ABAB 7 (1/1/1 per 1M tokens)
Features: 245K context, Chinese language

Zhipu AI

Zhipu AI (GLM)

Base URL: https://open.bigmodel.cn/api/paas/v4
API Key: ZHIPU_API_KEY
Driver: OpenAI-compatible
Models:
  • glm-4-plus - Advanced (7/7/7 per 1M tokens)
  • glm-4 - Balanced (1.4/1.4/1.4 per 1M tokens)
Features: 128K context, Chinese/English bilingual

Moonshot (Kimi)

Moonshot

Base URL: https://api.moonshot.cn/v1
API Key: MOONSHOT_API_KEY
Driver: OpenAI-compatible
Models:
  • moonshot-v1-128k - 128K context (8.5/8.5/8.5 per 1M tokens)
  • moonshot-v1-32k - 32K context (3.3/3.3/3.3 per 1M tokens)
Features: Long context, Chinese language

Baidu Qianfan

Baidu ERNIE

Base URL: https://aip.baidubce.com/rpc/2.0
API Key: QIANFAN_API_KEY
Driver: OpenAI-compatible
Models:
  • ernie-4.0-turbo - Advanced (4.2/4.2/8.4 per 1M tokens)
  • ernie-3.5-turbo - Balanced (0.56/0.56/1.12 per 1M tokens)
Features: 128K context, Chinese language, Baidu ecosystem

GitHub Copilot

GitHub Copilot

Base URL: https://api.githubcopilot.com
API Key: GITHUB_TOKEN
Driver: OpenAI-compatible
Models:
  • copilot-gpt-4o - GPT-4o via Copilot subscription
Features: Included with Copilot, code-optimized

Provider Selection

Choose providers based on:
  • Cost: Local (free) → Fast inference → Cloud APIs → Frontier labs
  • Latency: Groq, Cerebras → vLLM → Standard APIs
  • Privacy: Ollama, vLLM, LM Studio (100% local)
  • Compliance: AWS Bedrock (SOC2, HIPAA, FedRAMP)
  • Language: Chinese models for Chinese content
  • Features: Tool use, vision, long context

Testing Providers

# Test provider reachability
agentos models providers

# Check API key configuration
echo $ANTHROPIC_API_KEY

# Test with CLI
agentos message default "Hello" --model claude-haiku-4-5

Next Steps

Model Catalog

Browse all 47 models with pricing

Routing Logic

Learn complexity-based selection