Towards Omnidirectional Reasoning with 360-R1: A Dataset, Benchmark, and GRPO-based Method

Xinshen Zhang; Zhen Ye; Xu Zheng

arXiv:2505.14197·cs.CV·May 21, 2025

Towards Omnidirectional Reasoning with 360-R1: A Dataset, Benchmark, and GRPO-based Method

Xinshen Zhang, Zhen Ye, Xu Zheng

PDF

Open Access

TL;DR

This paper introduces OmniVQA, a pioneering dataset and benchmark for omnidirectional visual question answering, revealing current model limitations and proposing a novel rule-based reinforcement learning method, 360-R1, to improve understanding of 360-degree images.

Contribution

The paper provides the first dataset and benchmark for omnidirectional VQA and proposes a new RL-based method, 360-R1, tailored for panoramic scene reasoning.

Findings

01

State-of-the-art MLLMs struggle with omnidirectional VQA tasks.

02

360-R1 outperforms existing methods with a +6% accuracy improvement.

03

OmniVQA highlights key challenges in object localization and feature extraction in 360 images.

Abstract

Omnidirectional images (ODIs), with their 360{\deg} field of view, provide unparalleled spatial awareness for immersive applications like augmented reality and embodied AI. However, the capability of existing multi-modal large language models (MLLMs) to comprehend and reason about such panoramic scenes remains underexplored. This paper addresses this gap by introducing OmniVQA, the first dataset and conducting the first benchmark for omnidirectional visual question answering. Our evaluation of state-of-the-art MLLMs reveals significant limitations in handling omnidirectional visual question answering, highlighting persistent challenges in object localization, feature extraction, and hallucination suppression within panoramic contexts. These results underscore the disconnect between current MLLM capabilities and the demands of omnidirectional visual understanding, which calls for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies