Vision Models
OpenAI: GPT-5.2-Codex
GPT-5.2-Codex is an upgraded version of GPT-5.1-Codex optimized for software engineering a...
AllenAI: Molmo2 8B (free)
Molmo2-8B is an open vision-language model developed by the Allen Institute for AI (Ai2) a...
Google: Gemini 3 Flash Preview
Gemini 3 Flash Preview is a high speed, high value thinking model designed for agentic wor...
Z.AI: GLM 4.6V
GLM-4.6V is a large multimodal model designed for high-fidelity visual understanding and l...
Amazon: Nova 2 Lite
Nova 2 Lite is a fast, cost-effective reasoning model for everyday workloads that can proc...
Mistral: Ministral 3 14B 2512
The largest model in the Ministral 3 family, Ministral 3 14B offers frontier capabilities ...
Mistral: Ministral 3 8B 2512
A balanced model in the Ministral 3 family, Ministral 3 8B is a powerful, efficient tiny l...
Mistral: Ministral 3 3B 2512
The smallest model in the Ministral 3 family, Ministral 3 3B is a powerful, efficient tiny...
Google: Nano Banana Pro (Gemini 3 Pro Image Preview)
Nano Banana Pro is Googleβs most advanced image-generation and editing model, built on G...
Google: Gemini 3 Pro Preview
Gemini 3 Pro is Googleβs flagship frontier model for high-precision multimodal reasoning...
OpenAI: GPT-5.1-Codex
GPT-5.1-Codex is a specialized version of GPT-5.1 optimized for software engineering and c...
NVIDIA: Nemotron Nano 12B 2 VL (free)
NVIDIA Nemotron Nano 2 VL is a 12-billion-parameter open multimodal reasoning model design...
NVIDIA: Nemotron Nano 12B 2 VL
NVIDIA Nemotron Nano 2 VL is a 12-billion-parameter open multimodal reasoning model design...
Qwen: Qwen3 VL 32B Instruct
Qwen3-VL-32B-Instruct is a large-scale multimodal vision-language model designed for high-...
OpenAI: GPT-5 Image Mini
GPT-5 Image Mini combines OpenAI's advanced language capabilities, powered by [GPT-5 Mini]...
Qwen: Qwen3 VL 8B Thinking
Qwen3-VL-8B-Thinking is the reasoning-optimized variant of the Qwen3-VL-8B multimodal mode...
Qwen: Qwen3 VL 8B Instruct
Qwen3-VL-8B-Instruct is a multimodal vision-language model from the Qwen3-VL series, built...
OpenAI: GPT-5 Image
[GPT-5](https://openrouter.ai/openai/gpt-5) Image combines OpenAI's GPT-5 model with state...
Google: Gemini 2.5 Flash Image (Nano Banana)
Gemini 2.5 Flash Image, a.k.a. "Nano Banana," is now generally available. It is a state of...
Qwen: Qwen3 VL 30B A3B Thinking
Qwen3-VL-30B-A3B-Thinking is a multimodal model that unifies strong text generation with v...
Qwen: Qwen3 VL 30B A3B Instruct
Qwen3-VL-30B-A3B-Instruct is a multimodal model that unifies strong text generation with v...
Qwen: Qwen3 VL 235B A22B Thinking
Qwen3-VL-235B-A22B Thinking is a multimodal model that unifies strong text generation with...
Qwen: Qwen3 VL 235B A22B Instruct
Qwen3-VL-235B-A22B Instruct is an open-weight multimodal model that unifies strong text ge...
OpenAI: GPT-5 Codex
GPT-5-Codex is a specialized version of GPT-5 optimized for software engineering and codin...
StepFun: Step3
Step3 is a cutting-edge multimodal reasoning modelβbuilt on a Mixture-of-Experts archite...
Baidu: ERNIE 4.5 VL 28B A3B
A powerful multimodal Mixture-of-Experts chat model featuring 28B total parameters with 3B...
Z.AI: GLM 4.5V
GLM-4.5V is a vision-language foundation model for multimodal agent applications. Built on...
ByteDance: UI-TARS 7B
UI-TARS-1.5 is a multimodal vision-language agent optimized for GUI-based environments, in...
xAI: Grok 4
Grok 4 is xAI's latest reasoning model with a 256k context window. It supports parallel to...
Baidu: ERNIE 4.5 VL 424B A47B
ERNIE-4.5-VL-424B-A47B is a multimodal Mixture-of-Experts (MoE) model from Baiduβs ERNIE...
Mistral: Mistral Small 3.2 24B
Mistral-Small-3.2-24B-Instruct-2506 is an updated 24B parameter model from Mistral optimiz...
Google: Gemma 3n 4B (free)
Gemma 3n E4B-it is optimized for efficient execution on mobile and low-resource devices, s...
Google: Gemma 3n 4B
Gemma 3n E4B-it is optimized for efficient execution on mobile and low-resource devices, s...
Arcee AI: Spotlight
Spotlight is a 7βbillionβparameter visionβlanguage model derived from Qwenβ―2.5βV...
Meta: Llama Guard 4 12B
Llama Guard 4 is a Llama 4 Scout-derived multimodal pretrained model, fine-tuned for conte...
OpenAI: o3
o3 is a well-rounded and powerful model across domains. It sets a new standard for math, s...
OpenAI: GPT-4.1 Mini
GPT-4.1 Mini is a mid-sized model delivering performance competitive with GPT-4o at substa...
Meta: Llama 4 Maverick
Llama 4 Maverick 17B Instruct (128E) is a high-capacity multimodal language model from Met...
Meta: Llama 4 Scout
Llama 4 Scout 17B Instruct (16E) is a mixture-of-experts (MoE) language model developed by...
Qwen: Qwen2.5 VL 32B Instruct
Qwen2.5-VL-32B is a multimodal vision-language model fine-tuned through reinforcement lear...
Mistral: Mistral Small 3.1 24B (free)
Mistral Small 3.1 24B Instruct is an upgraded variant of Mistral Small 3 (2501), featuring...
Mistral: Mistral Small 3.1 24B
Mistral Small 3.1 24B Instruct is an upgraded variant of Mistral Small 3 (2501), featuring...
Google: Gemma 3 4B (free)
Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It ha...
Google: Gemma 3 4B
Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It ha...
Google: Gemma 3 12B (free)
Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It ha...
Google: Gemma 3 12B
Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It ha...
Google: Gemma 3 27B (free)
Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It ha...
Google: Gemma 3 27B
Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It ha...
OpenAI: o3 Mini High
OpenAI o3-mini-high is the same model as [o3-mini](/openai/o3-mini) with reasoning_effort ...
Qwen: Qwen VL Plus
Qwen's Enhanced Large Visual Language Model. Significantly upgraded for detailed recogniti...
Qwen: Qwen2.5 VL 72B Instruct
Qwen2.5-VL is proficient in recognizing common objects such as flowers, birds, fish, and i...
OpenAI: o3 Mini
OpenAI o3-mini is a cost-efficient language model optimized for STEM reasoning tasks, part...
MiniMax: MiniMax-01
MiniMax-01 is a combines MiniMax-Text-01 for text generation and MiniMax-VL-01 for image u...
Amazon: Nova Lite 1.0
Amazon Nova Lite 1.0 is a very low-cost multimodal model from Amazon that focused on fast ...
OpenAI: GPT-4o (2024-11-20)
The 2024-11-20 version of GPT-4o offers a leveled-up creative writing ability with more na...
Mistral: Pixtral Large 2411
Pixtral Large is a 124B parameter, open-weight, multimodal model built on top of [Mistral ...
Anthropic: Claude 3.5 Sonnet
New Claude 3.5 Sonnet delivers better-than-Opus capabilities, faster-than-Sonnet speeds, a...
Meta: Llama 3.2 11B Vision Instruct
Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle ...
Mistral: Pixtral 12B
The first multi-modal, text+image-to-text model from Mistral AI. Its weights were launched...
Qwen: Qwen2.5-VL 7B Instruct (free)
Qwen2.5 VL 7B is a multimodal LLM from the Qwen Team with the following key enhancements: ...
Qwen: Qwen2.5-VL 7B Instruct
Qwen2.5 VL 7B is a multimodal LLM from the Qwen Team with the following key enhancements: ...
OpenAI: GPT-4o (2024-08-06)
The 2024-08-06 version of GPT-4o offers improved performance in structured outputs, with t...
OpenAI: GPT-4o-mini (2024-07-18)
GPT-4o mini is OpenAI's newest model after [GPT-4 Omni](/models/openai/gpt-4o), supporting...
OpenAI: GPT-4o-mini
GPT-4o mini is OpenAI's newest model after [GPT-4 Omni](/models/openai/gpt-4o), supporting...
OpenAI: GPT-4o (2024-05-13)
GPT-4o ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inputs...
OpenAI: GPT-4o
GPT-4o ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inputs...
OpenAI: GPT-4o (extended)
GPT-4o ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inputs...
OpenAI: GPT-4 Turbo
The latest GPT-4 Turbo model with vision capabilities. Vision requests can now use JSON mo...
OpenAI: GPT-4 Turbo (older v1106)
The latest GPT-4 Turbo model with vision capabilities. Vision requests can now use JSON mo...