Best AI Models for Multimodal

nvidia

Free/1M

NVIDIA: Nemotron 3.5 Content Safety (free)

NVIDIA Nemotron 3.5 Content Safety is a compact 4B-parameter multimodal guardrail model fr...

📝 128,000 ctx Compare →

minimax

$0.30/1M

MiniMax: MiniMax M3

MiniMax-M3 is a multimodal foundation model from MiniMax. It supports text, image, and vid...

📝 1,048,576 ctx Compare →

stepfun

$0.20/1M

StepFun: Step 3.7 Flash

Step 3.7 Flash is StepFun's latest high-efficiency multimodal Mixture-of-Experts model. It...

📝 256,000 ctx Compare →

google

$1.50/1M

Google: Gemini 3.5 Flash

Gemini 3.5 Flash is Google's high-efficiency multimodal model, bringing near-Pro level cod...

📝 1,048,576 ctx Compare →

google

$0.25/1M

Google: Gemini 3.1 Flash Lite

Gemini 3.1 Flash Lite is Google’s GA high-efficiency multimodal model optimized for low-...

📝 1,048,576 ctx Compare →

nvidia

Free/1M

NVIDIA: Nemotron 3 Nano Omni (free)

NVIDIA Nemotron™ 3 Nano Omni is a 30B-A3B open multimodal model designed to function as ...

📝 256,000 ctx Compare →

qwen

$0.30/1M

Qwen: Qwen3.5 Plus 2026-04-20

Qwen3.5 Plus (April 2026) is a large-scale multimodal language model from Alibaba. It acce...

📝 1,000,000 ctx Compare →

qwen

$0.14/1M

Qwen: Qwen3.6 35B A3B

Qwen3.6-35B-A3B is an open-weight multimodal model from Alibaba Cloud with 35 billion tota...

📝 262,144 ctx Compare →

qwen

$0.29/1M

Qwen: Qwen3.6 27B

Qwen3.6 27B is a dense 27-billion-parameter language model from the Qwen Team at Alibaba, ...

📝 262,144 ctx Compare →

xiaomi

$0.14/1M

Xiaomi: MiMo-V2.5

MiMo-V2.5 is a native omnimodal model by Xiaomi. It delivers Pro-level agentic performance...

📝 1,048,576 ctx Compare →

openai

$8.00/1M

OpenAI: GPT-5.4 Image 2

[GPT-5.4](https://openrouter.ai/openai/gpt-5.4) Image 2 combines OpenAI's GPT-5.4 model wi...

📝 272,000 ctx Compare →

moonshotai

Free/1M

MoonshotAI: Kimi K2.6 (free)

Kimi K2.6 is Moonshot AI's next-generation multimodal model, designed for long-horizon cod...

📝 262,144 ctx Compare →

moonshotai

$0.68/1M

MoonshotAI: Kimi K2.6

Kimi K2.6 is Moonshot AI's next-generation multimodal model, designed for long-horizon cod...

📝 262,144 ctx Compare →

google

Free/1M

Google: Gemma 4 31B (free)

Gemma 4 31B Instruct is Google DeepMind's 30.7B dense multimodal model supporting text and...

📝 262,144 ctx Compare →

google

$0.12/1M

Google: Gemma 4 31B

Gemma 4 31B Instruct is Google DeepMind's 30.7B dense multimodal model supporting text and...

📝 262,144 ctx Compare →

z-ai

$1.20/1M

Z.ai: GLM 5V Turbo

GLM-5V-Turbo is Z.ai’s first native multimodal agent foundation model, built for vision-...

📝 202,752 ctx Compare →

rekaai

$0.10/1M

Reka Edge

Reka Edge is an extremely efficient 7B multimodal vision-language model that accepts image...

📝 16,384 ctx Compare →

bytedance-seed

$0.25/1M

ByteDance Seed: Seed-2.0-Lite

Seed-2.0-Lite is a versatile, cost‑efficient enterprise workhorse that delivers strong m...

📝 262,144 ctx Compare →

qwen

$0.04/1M

Qwen: Qwen3.5-9B

Qwen3.5-9B is a multimodal foundation model from the Qwen3.5 family, designed to deliver s...

📝 262,144 ctx Compare →

bytedance-seed

$0.10/1M

ByteDance Seed: Seed-2.0-Mini

Seed-2.0-mini targets latency-sensitive, high-concurrency, and cost-sensitive scenarios, e...

📝 262,144 ctx Compare →

google

$2.00/1M

Google: Gemini 3.1 Pro Preview

Gemini 3.1 Pro Preview is Google’s frontier reasoning model, delivering enhanced softwar...

📝 1,048,576 ctx Compare →

moonshotai

$0.40/1M

MoonshotAI: Kimi K2.5

Kimi K2.5 is Moonshot AI's native multimodal model, delivering state-of-the-art visual cod...

📝 262,144 ctx Compare →

bytedance-seed

$0.08/1M

ByteDance Seed: Seed 1.6 Flash

Seed 1.6 Flash is an ultra-fast multimodal deep thinking model by ByteDance Seed, supporti...

📝 262,144 ctx Compare →

bytedance-seed

$0.25/1M

ByteDance Seed: Seed 1.6

Seed 1.6 is a general-purpose model released by the ByteDance Seed team. It incorporates m...

📝 262,144 ctx Compare →

z-ai

$0.30/1M

Z.ai: GLM 4.6V

GLM-4.6V is a large multimodal model designed for high-fidelity visual understanding and l...

📝 131,072 ctx Compare →

anthropic

$5.00/1M

Anthropic: Claude Opus 4.5

Claude Opus 4.5 is Anthropic’s frontier reasoning model optimized for complex software e...

📝 200,000 ctx Compare →

google

$2.00/1M

Google: Nano Banana Pro (Gemini 3 Pro Image Preview)

Nano Banana Pro is Google’s most advanced image-generation and editing model, built on G...

📝 65,536 ctx Compare →

amazon

$2.50/1M

Amazon: Nova Premier 1.0

Amazon Nova Premier is the most capable of Amazon’s multimodal models for complex reason...

📝 1,000,000 ctx Compare →

nvidia

Free/1M

NVIDIA: Nemotron Nano 12B 2 VL (free)

NVIDIA Nemotron Nano 2 VL is a 12-billion-parameter open multimodal reasoning model design...

📝 128,000 ctx Compare →

qwen

$0.10/1M

Qwen: Qwen3 VL 32B Instruct

Qwen3-VL-32B-Instruct is a large-scale multimodal vision-language model designed for high-...

📝 262,144 ctx Compare →

openai

$2.50/1M

OpenAI: GPT-5 Image Mini

GPT-5 Image Mini combines OpenAI's advanced language capabilities, powered by [GPT-5 Mini]...

📝 400,000 ctx Compare →

qwen

$0.12/1M

Qwen: Qwen3 VL 8B Thinking

Qwen3-VL-8B-Thinking is the reasoning-optimized variant of the Qwen3-VL-8B multimodal mode...

📝 256,000 ctx Compare →

qwen

$0.08/1M

Qwen: Qwen3 VL 8B Instruct

Qwen3-VL-8B-Instruct is a multimodal vision-language model from the Qwen3-VL series, built...

📝 256,000 ctx Compare →

qwen

$0.13/1M

Qwen: Qwen3 VL 30B A3B Thinking

Qwen3-VL-30B-A3B-Thinking is a multimodal model that unifies strong text generation with v...

📝 131,072 ctx Compare →

qwen

$0.13/1M

Qwen: Qwen3 VL 30B A3B Instruct

Qwen3-VL-30B-A3B-Instruct is a multimodal model that unifies strong text generation with v...

📝 262,144 ctx Compare →

qwen

$0.26/1M

Qwen: Qwen3 VL 235B A22B Thinking

Qwen3-VL-235B-A22B Thinking is a multimodal model that unifies strong text generation with...

📝 131,072 ctx Compare →

qwen

$0.20/1M

Qwen: Qwen3 VL 235B A22B Instruct

Qwen3-VL-235B-A22B Instruct is an open-weight multimodal model that unifies strong text ge...

📝 262,144 ctx Compare →

baidu

$0.14/1M

Baidu: ERNIE 4.5 VL 28B A3B

A powerful multimodal Mixture-of-Experts chat model featuring 28B total parameters with 3B...

📝 131,072 ctx Compare →

z-ai

$0.60/1M

Z.ai: GLM 4.5V

GLM-4.5V is a vision-language foundation model for multimodal agent applications. Built on...

📝 65,536 ctx Compare →

openai

$1.25/1M

OpenAI: GPT-5 Chat

GPT-5 Chat is designed for advanced, natural, multimodal, and context-aware conversations ...

📝 128,000 ctx Compare →

bytedance

$0.10/1M

ByteDance: UI-TARS 7B

UI-TARS-1.5 is a multimodal vision-language agent optimized for GUI-based environments, in...

📝 128,000 ctx Compare →

baidu

$0.42/1M

Baidu: ERNIE 4.5 VL 424B A47B

ERNIE-4.5-VL-424B-A47B is a multimodal Mixture-of-Experts (MoE) model from Baidu’s ERNIE...

📝 131,072 ctx Compare →

google

$0.06/1M

Google: Gemma 3n 4B

Gemma 3n E4B-it is optimized for efficient execution on mobile and low-resource devices, s...

📝 32,768 ctx Compare →

mistralai

$0.40/1M

Mistral: Mistral Medium 3

Mistral Medium 3 is a high-performance enterprise-grade language model designed to deliver...

📝 131,072 ctx Compare →

arcee-ai

$0.18/1M

Arcee AI: Spotlight

Spotlight is a 7‑billion‑parameter vision‑language model derived from Qwen 2.5‑VL ...

📝 131,072 ctx Compare →

meta-llama

$0.18/1M

Meta: Llama Guard 4 12B

Llama Guard 4 is a Llama 4 Scout-derived multimodal pretrained model, fine-tuned for conte...

📝 163,840 ctx Compare →

openai

$1.10/1M

OpenAI: o4 Mini

OpenAI o4-mini is a compact reasoning model in the o-series, optimized for fast, cost-effi...

📝 200,000 ctx Compare →

meta-llama

$0.15/1M

Meta: Llama 4 Maverick

Llama 4 Maverick 17B Instruct (128E) is a high-capacity multimodal language model from Met...

📝 1,048,576 ctx Compare →

meta-llama

$0.08/1M

Meta: Llama 4 Scout

Llama 4 Scout 17B Instruct (16E) is a mixture-of-experts (MoE) language model developed by...

📝 10,000,000 ctx Compare →

mistralai

$0.35/1M

Mistral: Mistral Small 3.1 24B

Mistral Small 3.1 24B Instruct is an upgraded variant of Mistral Small 3 (2501), featuring...

📝 128,000 ctx Compare →

google

$0.04/1M

Google: Gemma 3 4B

Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It ha...

📝 131,072 ctx Compare →

google

$0.04/1M

Google: Gemma 3 12B

Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It ha...

📝 131,072 ctx Compare →

google

$0.08/1M

Google: Gemma 3 27B

Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It ha...

📝 131,072 ctx Compare →

amazon

$0.06/1M

Amazon: Nova Lite 1.0

Amazon Nova Lite 1.0 is a very low-cost multimodal model from Amazon that focused on fast ...

📝 300,000 ctx Compare →

amazon

$0.80/1M

Amazon: Nova Pro 1.0

Amazon Nova Pro 1.0 is a capable multimodal model from Amazon focused on providing a combi...

📝 300,000 ctx Compare →

meta-llama

$0.25/1M

Meta: Llama 3.2 11B Vision Instruct

Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle ...

📝 131,072 ctx Compare →

anthropic

$0.25/1M

Anthropic: Claude 3 Haiku

Claude 3 Haiku is Anthropic's fastest and most compact model for near-instant responsivene...

📝 200,000 ctx Compare →

openai

$30.00/1M

OpenAI: GPT-4

OpenAI's flagship model, GPT-4 is a large-scale multimodal language model capable of solvi...

📝 8,191 ctx Compare →

Multimodal Models

NVIDIA: Nemotron 3.5 Content Safety (free)

MiniMax: MiniMax M3

StepFun: Step 3.7 Flash

Google: Gemini 3.5 Flash

Google: Gemini 3.1 Flash Lite

NVIDIA: Nemotron 3 Nano Omni (free)

Qwen: Qwen3.5 Plus 2026-04-20

Qwen: Qwen3.6 35B A3B

Qwen: Qwen3.6 27B

Xiaomi: MiMo-V2.5

OpenAI: GPT-5.4 Image 2

MoonshotAI: Kimi K2.6 (free)

MoonshotAI: Kimi K2.6

Google: Gemma 4 31B (free)

Google: Gemma 4 31B

Z.ai: GLM 5V Turbo

Reka Edge

ByteDance Seed: Seed-2.0-Lite

Qwen: Qwen3.5-9B

ByteDance Seed: Seed-2.0-Mini

Google: Gemini 3.1 Pro Preview

MoonshotAI: Kimi K2.5

ByteDance Seed: Seed 1.6 Flash

ByteDance Seed: Seed 1.6

Z.ai: GLM 4.6V

Anthropic: Claude Opus 4.5

Google: Nano Banana Pro (Gemini 3 Pro Image Preview)

Amazon: Nova Premier 1.0

NVIDIA: Nemotron Nano 12B 2 VL (free)

Qwen: Qwen3 VL 32B Instruct

OpenAI: GPT-5 Image Mini

Qwen: Qwen3 VL 8B Thinking

Qwen: Qwen3 VL 8B Instruct

Qwen: Qwen3 VL 30B A3B Thinking

Qwen: Qwen3 VL 30B A3B Instruct

Qwen: Qwen3 VL 235B A22B Thinking

Qwen: Qwen3 VL 235B A22B Instruct

Baidu: ERNIE 4.5 VL 28B A3B

Z.ai: GLM 4.5V

OpenAI: GPT-5 Chat

ByteDance: UI-TARS 7B

Baidu: ERNIE 4.5 VL 424B A47B

Google: Gemma 3n 4B

Mistral: Mistral Medium 3

Arcee AI: Spotlight

Meta: Llama Guard 4 12B

OpenAI: o4 Mini

Meta: Llama 4 Maverick

Meta: Llama 4 Scout

Mistral: Mistral Small 3.1 24B

Google: Gemma 3 4B

Google: Gemma 3 12B

Google: Gemma 3 27B

Amazon: Nova Lite 1.0

Amazon: Nova Pro 1.0

Meta: Llama 3.2 11B Vision Instruct

Anthropic: Claude 3 Haiku

OpenAI: GPT-4