bytedance
ByteDance: UI-TARS 7B
UI-TARS-1.5 is a multimodal vision-language agent optimized for GUI-based environments, including desktop interfaces, web browsers, mobile systems, and games. Built by ByteDance, it builds upon the UI-TARS framework with reinforcement...
Input Cost
$0.10
per 1M tokens
Output Cost
$0.20
per 1M tokens
Context Window
128,000
tokens
Developer ID: bytedance/ui-tars-1.5-7b
Related Models
google
$0.14/1M
Google: Gemma 4 31B
Gemma 4 31B Instruct is Google DeepMind's 30.7B dense multimodal model supporting text and...
z-ai
$1.20/1M
Z.ai: GLM 5V Turbo
GLM-5V-Turbo is Z.ai’s first native multimodal agent foundation model, built for vision-...
rekaai
$0.10/1M
Reka Edge
Reka Edge is an extremely efficient 7B multimodal vision-language model that accepts image...