OpenRouter

A unified API gateway for large language models that lets you call 100+ LLMs from different providers through a single OpenAI-compatible endpoint with automatic fallback and cost routing.

OpenRouter is an API aggregator for LLMs. Instead of managing separate API keys for OpenAI, Anthropic, Google, Mistral, and 20 other providers, you use one OpenRouter API key and specify the model by name. OpenRouter handles routing, authentication, and billing.

Core Value Proposition

  • Single API key — one Authorization header for all providers
  • OpenAI-compatible — point existing OpenAI SDK code at https://openrouter.ai/api/v1
  • Model catalog — 100+ models: Claude, GPT-4o, Gemini, Llama 3, Mistral, Qwen, and more
  • Fallback routing — if a provider has an outage, route to a backup model automatically
  • Cost routing — automatically pick the cheapest model that meets your capability requirements
// Same code, different model
const response = await openai.chat.completions.create({
  model: "anthropic/claude-3-5-sonnet",  // or "openai/gpt-4o"
  messages: [{ role: "user", content: "Hello" }]
});
// just change baseURL to https://openrouter.ai/api/v1

Use Cases

  • Model experimentation — A/B test different models without integration work
  • Cost optimization — route simple tasks to cheap models, complex tasks to capable ones
  • Reliability — automatic failover during provider outages
  • Startups — avoid being locked into a single provider before you know which model fits

vs Self-Hosting

OpenRouter uses hosted models — your data goes to the underlying provider (with OpenRouter as an intermediary). For privacy-sensitive workloads, Ollama (local) or Vllm (self-hosted) keep data on your infrastructure. OpenRouter trades privacy control for convenience and breadth of model access.

Pricing

OpenRouter charges a small markup (typically 0–20%) on top of provider rates. For high-volume production workloads, going direct to providers may be cheaper. For experimentation and low-to-medium volume, OpenRouter's convenience often outweighs the markup.

  • Vllm — self-hosted inference server for full control
  • Ollama — local model runner, no cloud dependency
  • Inference Endpoint — the general concept of an API serving model predictions
  • Langchain — orchestration framework that integrates with OpenRouter

Subscribe to Sahil's Playbook

Clear thinking on product, engineering, and building at scale. No noise. One email when there's something worth sharing.
[email protected]
Subscribe
Mastodon