360{\deg} Image Perception with MLLMs: A Comprehensive Benchmark and a Training-Free Method

Huyen T. T. Tran; Van-Quang Nguyen; Farros Alferro; Kang-Jun Liu; Takayuki Okatani

arXiv:2603.16179·cs.CV·March 27, 2026

360{\deg} Image Perception with MLLMs: A Comprehensive Benchmark and a Training-Free Method

Huyen T. T. Tran, Van-Quang Nguyen, Farros Alferro, Kang-Jun Liu, Takayuki Okatani

PDF

Open Access

TL;DR

This paper introduces 360Bench, a comprehensive benchmark for 360-degree image perception with MLLMs, evaluates existing models, and proposes Free360, a training-free scene-graph-based framework that enhances 360-degree visual question answering.

Contribution

The paper presents a new benchmark 360Bench, systematically evaluates MLLMs on 360-degree images, and introduces Free360, a novel training-free method for improved 360-degree VQA.

Findings

01

MLLMs show limitations in 360-degree image perception.

02

Free360 consistently improves base MLLMs' performance.

03

The benchmark reveals specific shortcomings in current models.

Abstract

Multimodal Large Language Models (MLLMs) have shown impressive abilities in understanding and reasoning over conventional images. However, their perception of 360{\deg} images remains largely underexplored. Unlike conventional images, 360{\deg} images capture the entire surrounding environment, enabling holistic spatial reasoning but introducing challenges such as geometric distortion and complex spatial relations. To comprehensively assess MLLMs' capabilities to perceive 360{\deg} images, we introduce 360Bench, a Visual Question Answering (VQA) benchmark featuring 7K-resolution 360{\deg} images, seven representative (sub)tasks with annotations carefully curated by human annotators. Using 360Bench, we systematically evaluate seven MLLMs and six enhancement methods, revealing their shortcomings in 360{\deg} image perception. To address these challenges, we propose Free360, a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Advanced Image and Video Retrieval Techniques