Ovis2.5 Technical Report
Shiyin Lu, Yang Li, Yu Xia, Yuwei Hu, Shanshan Zhao, Yanqing Ma, Zhichao Wei, Yinglun Li, Lunhao Duan, Jianshan Zhao, Yuxuan Han, Haijun Li, Wanying Chen, Junke Tang, Chengkun Hou, Zhixing Du, Tianli Zhou, Wenjie Zhang, Huping Ding, Jiahe Li, Wen Li, Gui Hu, Yiliang Gu

TL;DR
Ovis2.5 is a multimodal vision-language model with native-resolution perception and advanced reasoning, achieving state-of-the-art results in open-source large models and excelling in complex visual tasks.
Contribution
The paper introduces Ovis2.5, featuring native-resolution vision processing, reflection-based reasoning, a comprehensive training curriculum, and open-source models that outperform previous open models.
Findings
Ovis2.5-9B scores 78.3 on OpenCompass leaderboard.
Ovis2.5-2B achieves 73.9, SOTA for its size.
Ovis2.5 excels in STEM, grounding, video, and complex chart tasks.
Abstract
We present Ovis2.5, a successor to Ovis2 designed for native-resolution visual perception and strong multimodal reasoning. Ovis2.5 integrates a native-resolution vision transformer that processes images at their native, variable resolutions, avoiding the degradation from fixed-resolution tiling and preserving both fine detail and global layout -- crucial for visually dense content like complex charts. To strengthen reasoning, we train the model to move beyond linear chain-of-thought and perform reflection -- including self-checking and revision. This advanced capability is exposed as an optional "thinking mode" at inference time, allowing users to trade latency for enhanced accuracy on difficult inputs. The model is trained via a comprehensive five-phase curriculum that progressively builds its skills. The process begins with foundational visual and multimodal pretraining, advances…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗AIDC-AI/Ovis2.6-30B-A3Bmodel· 61k dl· ♡ 14261k dl♡ 142
- 🤗AIDC-AI/Ovis2.5-2Bmodel· 116k dl· ♡ 200116k dl♡ 200
- 🤗AIDC-AI/Ovis2.5-9Bmodel· 3.2k dl· ♡ 3043.2k dl♡ 304
- 🤗ViFortune-AI/VOVis2.5-2B-ptmodel· 6 dl6 dl
- 🤗wsbagnsv1/Ovis2.5-9B-sinq-4bit-experimentalmodel· 9 dl9 dl
- 🤗wsbagnsv1/Ovis2.5-2B-sinq-4bit-experimentalmodel· 3 dl3 dl
- 🤗PositronicLlama/Ovis2.5-9Bmodel· 50 dl50 dl
- 🤗cyankiwi/Ovis2.6-30B-A3B-AWQ-4bitmodel· 89 dl89 dl
- 🤗cyankiwi/Ovis2.6-30B-A3B-AWQ-8bitmodel· 197 dl· ♡ 1197 dl♡ 1
- 🤗GggLiu/Ovis2.6-30B-A3Bmodel
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEngineering Applied Research
