HumanOmniV2: From Understanding to Omni-Modal Reasoning with Context
Qize Yang, Shimin Yao, Weixuan Chen, Shenghao Fu, Detao Bai, Jiaxing Zhao, Boyuan Sun, Bowen Yin, Xihan Wei, Jingren Zhou

TL;DR
This paper introduces HumanOmniV2, a multimodal reasoning model enhanced with global context understanding and reward mechanisms, achieving superior performance on complex intent and emotion comprehension tasks.
Contribution
The paper presents a novel approach integrating global context reasoning, reward-based training, and a new benchmark for omni-modal understanding, advancing multimodal LLM capabilities.
Findings
Outperforms existing open-source models on omni-modal benchmarks.
Effectively addresses context understanding and shortcut problems.
Enhances reasoning accuracy with reward-based training mechanisms.
Abstract
With the rapid evolution of multimodal large language models, the capacity to deeply understand and interpret human intentions has emerged as a critical capability, which demands detailed and thoughtful reasoning. In recent studies, Reinforcement Learning (RL) has demonstrated potential in enhancing the reasoning capabilities of Large Language Models (LLMs). Nonetheless, the challenges associated with adapting RL to multimodal data and formats remain largely unaddressed. In this paper, we identify two issues in existing multimodal reasoning models: insufficient global context understanding and shortcut problems. Insufficient context understanding can happen when a model misinterprets multimodal context, resulting in incorrect answers. The shortcut problem occurs when the model overlooks crucial clues in multimodal inputs, directly addressing the query without considering the multimodal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies
