SAP: Segment Any 4K Panorama
Lutao Jiang, Zidong Cao, Weikai Chen, Xu Zheng, Yuanhuiyi Lyu, Zhenyang Li, Zeyu HU, Yingda Yin, Keyang Luo, Runze Zhang, Kai Yan, Shengju Qian, Haidi Fan, Yifan Peng, Xin Wang, Hui Xiong, Ying-Cong Chen

TL;DR
SAP introduces a novel approach for high-resolution 4K panoramic instance segmentation by reformulating the task as perspective video segmentation along a spherical path, enabling effective zero-shot generalization.
Contribution
The paper presents a new panoramic segmentation model, SAP, trained on synthesized data, that outperforms existing models on real-world 4K panoramas by reformulating the problem as perspective video segmentation.
Findings
Achieves +17.2 zero-shot mIoU gain over vanilla SAM2 on 4K panorama benchmark.
Synthesizes 183,440 panoramic images with labels for large-scale training.
Effectively generalizes to real-world 360° images with high resolution.
Abstract
Promptable instance segmentation is widely adopted in embodied and AR systems, yet the performance of foundation models trained on perspective imagery often degrades on 360{\deg} panoramas. In this paper, we introduce Segment Any 4K Panorama (SAP), a foundation model for 4K high-resolution panoramic instance-level segmentation. We reformulate panoramic segmentation as fixed-trajectory perspective video segmentation, decomposing a panorama into overlapping perspective patches sampled along a continuous spherical traversal. This memory-aligned reformulation preserves native 4K resolution while restoring the smooth viewpoint transitions required for stable cross-view propagation. To enable large-scale supervision, we synthesize 183,440 4K-resolution panoramic images with instance segmentation labels using the InfiniGen engine. Trained under this trajectory-aligned paradigm, SAP generalizes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Advanced Image and Video Retrieval Techniques · Advanced Neural Network Applications
