Towards Omnidirectional Reasoning with 360-R1: A Dataset, Benchmark, and GRPO-based Method
Xinshen Zhang, Zhen Ye, Xu Zheng

TL;DR
This paper introduces OmniVQA, a pioneering dataset and benchmark for omnidirectional visual question answering, revealing current model limitations and proposing a novel rule-based reinforcement learning method, 360-R1, to improve understanding of 360-degree images.
Contribution
The paper provides the first dataset and benchmark for omnidirectional VQA and proposes a new RL-based method, 360-R1, tailored for panoramic scene reasoning.
Findings
State-of-the-art MLLMs struggle with omnidirectional VQA tasks.
360-R1 outperforms existing methods with a +6% accuracy improvement.
OmniVQA highlights key challenges in object localization and feature extraction in 360 images.
Abstract
Omnidirectional images (ODIs), with their 360{\deg} field of view, provide unparalleled spatial awareness for immersive applications like augmented reality and embodied AI. However, the capability of existing multi-modal large language models (MLLMs) to comprehend and reason about such panoramic scenes remains underexplored. This paper addresses this gap by introducing OmniVQA, the first dataset and conducting the first benchmark for omnidirectional visual question answering. Our evaluation of state-of-the-art MLLMs reveals significant limitations in handling omnidirectional visual question answering, highlighting persistent challenges in object localization, feature extraction, and hallucination suppression within panoramic contexts. These results underscore the disconnect between current MLLM capabilities and the demands of omnidirectional visual understanding, which calls for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies
