360{\deg} Image Perception with MLLMs: A Comprehensive Benchmark and a Training-Free Method
Huyen T. T. Tran, Van-Quang Nguyen, Farros Alferro, Kang-Jun Liu, Takayuki Okatani

TL;DR
This paper introduces 360Bench, a comprehensive benchmark for 360-degree image perception with MLLMs, evaluates existing models, and proposes Free360, a training-free scene-graph-based framework that enhances 360-degree visual question answering.
Contribution
The paper presents a new benchmark 360Bench, systematically evaluates MLLMs on 360-degree images, and introduces Free360, a novel training-free method for improved 360-degree VQA.
Findings
MLLMs show limitations in 360-degree image perception.
Free360 consistently improves base MLLMs' performance.
The benchmark reveals specific shortcomings in current models.
Abstract
Multimodal Large Language Models (MLLMs) have shown impressive abilities in understanding and reasoning over conventional images. However, their perception of 360{\deg} images remains largely underexplored. Unlike conventional images, 360{\deg} images capture the entire surrounding environment, enabling holistic spatial reasoning but introducing challenges such as geometric distortion and complex spatial relations. To comprehensively assess MLLMs' capabilities to perceive 360{\deg} images, we introduce 360Bench, a Visual Question Answering (VQA) benchmark featuring 7K-resolution 360{\deg} images, seven representative (sub)tasks with annotations carefully curated by human annotators. Using 360Bench, we systematically evaluate seven MLLMs and six enhancement methods, revealing their shortcomings in 360{\deg} image perception. To address these challenges, we propose Free360, a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Advanced Image and Video Retrieval Techniques
