Panoramic Affordance Prediction

Zixin Zhang; Chenfei Liao; Hongfei Zhang; Harold Haodong Chen; Kanghao Chen; Zichen Wen; Litao Guo; Bin Ren; Xu Zheng; Yinchuan Li; Xuming Hu; Nicu Sebe; Ying-Cong Chen

arXiv:2603.15558·cs.CV·March 17, 2026

Panoramic Affordance Prediction

Zixin Zhang, Chenfei Liao, Hongfei Zhang, Harold Haodong Chen, Kanghao Chen, Zichen Wen, Litao Guo, Bin Ren, Xu Zheng, Yinchuan Li, Xuming Hu, Nicu Sebe, Ying-Cong Chen

PDF

Open Access 1 Datasets

TL;DR

This paper introduces Panoramic Affordance Prediction, leveraging 360-degree imagery and a novel dataset to improve holistic scene understanding and affordance prediction in embodied AI, overcoming limitations of narrow FoV models.

Contribution

It presents the first panoramic affordance prediction framework and a large-scale benchmark dataset, PAP-12K, along with a coarse-to-fine pipeline inspired by human vision.

Findings

01

Existing methods fail on panoramic images due to distortion and limited FoV.

02

The proposed PAP framework significantly outperforms state-of-the-art baselines.

03

Panoramic perception enhances robustness in embodied AI applications.

Abstract

Affordance prediction serves as a critical bridge between perception and action in embodied AI. However, existing research is confined to pinhole camera models, which suffer from narrow Fields of View (FoV) and fragmented observations, often missing critical holistic environmental context. In this paper, we present the first exploration into Panoramic Affordance Prediction, utilizing 360-degree imagery to capture global spatial relationships and holistic scene understanding. To facilitate this novel task, we first introduce PAP-12K, a large-scale benchmark dataset containing over 1,000 ultra-high-resolution (12k, 11904 x 5952) panoramic images with over 12k carefully annotated QA pairs and affordance masks. Furthermore, we propose PAP, a training-free, coarse-to-fine pipeline inspired by the human foveal visual system to tackle the ultra-high resolution and severe distortion inherent in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

PanoramaOrg/PAP-12K
dataset· 3.2k dl
3.2k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Robot Manipulation and Learning · Robotics and Sensor-Based Localization