UI-Venus-1.5 Technical Report
Venus Team, Changlong Gao, Zhangxuan Gu, Yulin Liu, Xinyu Qiu, Shuheng Shen, Yue Wen, Tianyu Xia, Zhenyu Xu, Zhengwen Zeng, Beitong Zhou, Xingran Zhou, Weizhi Chen, Sunhao Dai, Jingya Dou, Yichen Gong, Yuan Guo, Zhenlin Guo, Feng Li, Qian Li, Jinzhen Lin, Yuqi Zhou, Linchao Zhu

TL;DR
UI-Venus-1.5 is a comprehensive GUI agent that leverages large-scale pretraining, online reinforcement learning, and model merging to achieve state-of-the-art performance and robustness across diverse digital environments.
Contribution
The paper introduces three key technical advances: a mid-training stage with extensive data, online RL with full-trajectory rollouts, and a unified model via model merging, enhancing generality and performance.
Findings
Achieves new state-of-the-art on multiple benchmarks.
Demonstrates robust real-world navigation in Chinese mobile apps.
Outperforms previous models significantly.
Abstract
GUI agents have emerged as a powerful paradigm for automating interactions in digital environments, yet achieving both broad generality and consistently strong task performance remains challenging. In this report, we present UI-Venus-1.5, a unified, end-to-end GUI Agent designed for robust real-world applications. The proposed model family comprises two dense variants (2B and 8B) and one mixture-of-experts variant (30B-A3B) to meet various downstream application scenarios. Compared to our previous version, UI-Venus-1.5 introduces three key technical advances: (1) a comprehensive Mid-Training stage leveraging 10 billion tokens across 30+ datasets to establish foundational GUI semantics; (2) Online Reinforcement Learning with full-trajectory rollouts, aligning training objectives with long-horizon, dynamic navigation in large-scale environments; and (3) a single unified GUI Agent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗inclusionAI/UI-Venus-1.5-2Bmodel· 2.7k dl· ♡ 352.7k dl♡ 35
- 🤗inclusionAI/UI-Venus-1.5-8Bmodel· 4.2k dl· ♡ 244.2k dl♡ 24
- 🤗inclusionAI/UI-Venus-1.5-30B-A3Bmodel· 3.4k dl· ♡ 233.4k dl♡ 23
- 🤗mlx-community/UI-Venus-1.5-8B-bf16model· 13 dl13 dl
- 🤗mlx-community/UI-Venus-1.5-8B-6bitmodel· 10 dl10 dl
- 🤗mlx-community/UI-Venus-1.5-8B-4bitmodel· 21 dl21 dl
- 🤗mlx-community/UI-Venus-1.5-2B-bf16model· 17 dl17 dl
- 🤗mlx-community/UI-Venus-1.5-2B-6bitmodel· 17 dl17 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Social Robot Interaction and HRI · Topic Modeling
