bytedance

ByteDance: UI-TARS 7B

UI-TARS-1.5 is a multimodal vision-language agent optimized for GUI-based environments, including desktop interfaces, web browsers, mobile systems, and games. Built by ByteDance, it builds upon the UI-TARS framework with reinforcement...

Input Cost

$0.10

per 1M tokens

Output Cost

$0.20

per 1M tokens

Context Window

128,000

tokens

Compare vs GPT-4o

                Developer ID: bytedance/ui-tars-1.5-7b            

Related Models

google

$0.14/1M

Google: Gemma 4 31B

Gemma 4 31B Instruct is Google DeepMind's 30.7B dense multimodal model supporting text and...

📝 262,144 ctx Compare →

z-ai

$1.20/1M

Z.ai: GLM 5V Turbo

GLM-5V-Turbo is Z.ai’s first native multimodal agent foundation model, built for vision-...

📝 202,752 ctx Compare →

rekaai

$0.10/1M

Reka Edge

Reka Edge is an extremely efficient 7B multimodal vision-language model that accepts image...

📝 16,384 ctx Compare →