bytedance

ByteDance: UI-TARS 7B

UI-TARS-1.5 is a multimodal vision-language agent optimized for GUI-based environments, including desktop interfaces, web browsers, mobile systems, and games. Built by ByteDance, it builds upon the UI-TARS framework with reinforcement learning-based reasoning, enabling robust action planning and execution across virtual interfaces. This model achieves state-of-the-art results on a range of interactive and grounding benchmarks, including OSworld, WebVoyager, AndroidWorld, and ScreenSpot. It also demonstrates perfect task completion across diverse Poki games and outperforms prior models in Minecraft agent tasks. UI-TARS-1.5 supports thought decomposition during inference and shows strong scaling across variants, with the 1.5 version notably exceeding the performance of earlier 72B and 7B checkpoints.

Input Cost
$0.10
per 1M tokens
Output Cost
$0.20
per 1M tokens
Context Window
128,000
tokens
Compare vs GPT-4o
Developer ID: bytedance/ui-tars-1.5-7b

Related Models

liquid
Free/1M

LiquidAI: LFM2.5-1.2B-Thinking (free)

LFM2.5-1.2B-Thinking is a lightweight reasoning-focused model optimized for agentic tasks,...

📝 32,768 ctx Compare →
openai
$1.75/1M

OpenAI: GPT-5.2-Codex

GPT-5.2-Codex is an upgraded version of GPT-5.1-Codex optimized for software engineering a...

📝 400,000 ctx Compare →
allenai
$0.20/1M

AllenAI: Olmo 3.1 32B Instruct

Olmo 3.1 32B Instruct is a large-scale, 32-billion-parameter instruction-tuned language mo...

📝 65,536 ctx Compare →