HumanOmniV2: From Understanding to Omni-Modal Reasoning with Context

Qize Yang; Shimin Yao; Weixuan Chen; Shenghao Fu; Detao Bai; Jiaxing Zhao; Boyuan Sun; Bowen Yin; Xihan Wei; Jingren Zhou

arXiv:2506.21277·cs.CV·June 27, 2025

HumanOmniV2: From Understanding to Omni-Modal Reasoning with Context

Qize Yang, Shimin Yao, Weixuan Chen, Shenghao Fu, Detao Bai, Jiaxing Zhao, Boyuan Sun, Bowen Yin, Xihan Wei, Jingren Zhou

PDF

Open Access 1 Repo 1 Models 2 Datasets

TL;DR

This paper introduces HumanOmniV2, a multimodal reasoning model enhanced with global context understanding and reward mechanisms, achieving superior performance on complex intent and emotion comprehension tasks.

Contribution

The paper presents a novel approach integrating global context reasoning, reward-based training, and a new benchmark for omni-modal understanding, advancing multimodal LLM capabilities.

Findings

01

Outperforms existing open-source models on omni-modal benchmarks.

02

Effectively addresses context understanding and shortcut problems.

03

Enhances reasoning accuracy with reward-based training mechanisms.

Abstract

With the rapid evolution of multimodal large language models, the capacity to deeply understand and interpret human intentions has emerged as a critical capability, which demands detailed and thoughtful reasoning. In recent studies, Reinforcement Learning (RL) has demonstrated potential in enhancing the reasoning capabilities of Large Language Models (LLMs). Nonetheless, the challenges associated with adapting RL to multimodal data and formats remain largely unaddressed. In this paper, we identify two issues in existing multimodal reasoning models: insufficient global context understanding and shortcut problems. Insufficient context understanding can happen when a model misinterprets multimodal context, resulting in incorrect answers. The shortcut problem occurs when the model overlooks crucial clues in multimodal inputs, directly addressing the query without considering the multimodal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

humanmllm/humanomniv2
pytorchOfficial

Models

🤗
PhilipC/HumanOmniV2
model· 115 dl· ♡ 19
115 dl♡ 19

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies