Service Tiers

Control cost and latency tradeoffs with service tier selection

Service Tiers

The service_tier parameter lets you control cost and latency tradeoffs when sending requests through OpenRouter. You can pass it in your request to select a specific processing tier, and the response will indicate which tier was actually used.

Not every model from a provider supports service tiers. Additionally, your requested service tier is not guaranteed to be honored — the provider may serve your request on a different tier depending on availability. The service_tier field in the response indicates which tier was actually used, and you will be billed according to that actual tier.

Supported Providers

OpenAI

  • Accepted request values: auto, default, flex, priority (default if omitted: auto)
  • Possible response values: default, flex, priority

Learn more in OpenAI’s Chat Completions and Responses API documentation. See OpenAI’s pricing page for details on cost differences between tiers.

Google (Vertex AI)

  • Accepted request values: standard, flex, priority (default if omitted: standard)
  • Possible response values: standard, flex, priority

Learn more in Google’s Flex and Priority documentation.

API Response Differences

The API response includes a service_tier field that indicates which capacity tier was actually used to serve your request. The placement of this field varies by API format:

  • Chat Completions API (/api/v1/chat/completions): service_tier is returned at the top level of the response object, matching OpenAI’s native format.
  • Responses API (/api/v1/responses): service_tier is returned at the top level of the response object, matching OpenAI’s native format.
  • Messages API (/api/v1/messages): service_tier is returned inside the usage object, matching Anthropic’s native format.