PanoSAM2: Lightweight Distortion- and Memory-aware Adaptions of SAM2 for 360 Video Object Segmentation

Dingwen Xiao; Weiming Zhang; Shiqi Wen; Lin Wang

arXiv:2604.07901·cs.CV·April 10, 2026

PanoSAM2: Lightweight Distortion- and Memory-aware Adaptions of SAM2 for 360 Video Object Segmentation

Dingwen Xiao, Weiming Zhang, Shiqi Wen, Lin Wang

PDF

TL;DR

PanoSAM2 is a lightweight adaptation of SAM2 designed for 360 video object segmentation, addressing distortion, memory, and semantic issues to improve temporal coherence and accuracy.

Contribution

It introduces novel distortion-aware decoding, a distortion-guided loss, and a long-short memory module for effective 360VOS with minimal additional complexity.

Findings

01

Achieves +5.6 improvement on 360VOTS

02

Achieves +6.7 improvement on PanoVOS

03

Demonstrates significant gains over SAM2 in 360VOS tasks

Abstract

360 video object segmentation (360VOS) aims to predict temporally-consistent masks in 360 videos, offering full-scene coverage, benefiting applications, such as VR/AR and embodied AI. Learning 360VOS model is nontrivial due to the lack of high-quality labeled dataset. Recently, Segment Anything Models (SAMs), especially SAM2 -- with its design of memory module -- shows strong, promptable VOS capability. However, directly using SAM2 for 360VOS yields implausible results as 360 videos suffer from the projection distortion, semantic inconsistency of left-right sides, and sparse object mask information in SAM2's memory. To this end, we propose PanoSAM2, a novel 360VOS framework based on our lightweight distortion- and memory-aware adaptation strategies of SAM2 to achieve reliable 360VOS while retaining SAM2's user-friendly prompting design. Concretely, to tackle the projection distortion…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.