bytedance

ByteDance: UI-TARS 7B

UI-TARS-1.5 is a multimodal vision-language agent optimized for GUI-based environments, including desktop interfaces, web browsers, mobile systems, and games. Built by ByteDance, it builds upon the UI-TARS framework with reinforcement...

Input Cost
$0.10
per 1M tokens
Output Cost
$0.20
per 1M tokens
Context Window
128,000
tokens
Compare vs GPT-4o
Developer ID: bytedance/ui-tars-1.5-7b

Related Models

google
$0.14/1M

Google: Gemma 4 31B

Gemma 4 31B Instruct is Google DeepMind's 30.7B dense multimodal model supporting text and...

📝 262,144 ctx Compare →
z-ai
$1.20/1M

Z.ai: GLM 5V Turbo

GLM-5V-Turbo is Z.ai’s first native multimodal agent foundation model, built for vision-...

📝 202,752 ctx Compare →
rekaai
$0.10/1M

Reka Edge

Reka Edge is an extremely efficient 7B multimodal vision-language model that accepts image...

📝 16,384 ctx Compare →