PanoVOS: Bridging Non-panoramic and Panoramic Views with Transformer for Video Segmentation
Shilin Yan, Xiaohao Xu, Renrui Zhang, Lingyi Hong, Wenchao Chen,, Wenqiang Zhang, Wei Zhang

TL;DR
This paper introduces PanoVOS, a new panoramic video dataset, and proposes PSCFormer, a transformer-based model that effectively handles pixel-level content discontinuities for improved panoramic video segmentation.
Contribution
The paper provides the first panoramic video dataset for segmentation and introduces PSCFormer, a transformer model that leverages semantic boundary information for better performance.
Findings
Existing models fail on panoramic videos due to content discontinuities.
PSCFormer outperforms previous state-of-the-art models in panoramic video segmentation.
PanoVOS dataset presents new challenges for panoramic video analysis.
Abstract
Panoramic videos contain richer spatial information and have attracted tremendous amounts of attention due to their exceptional experience in some fields such as autonomous driving and virtual reality. However, existing datasets for video segmentation only focus on conventional planar images. To address the challenge, in this paper, we present a panoramic video dataset, PanoVOS. The dataset provides 150 videos with high video resolutions and diverse motions. To quantify the domain gap between 2D planar videos and panoramic videos, we evaluate 15 off-the-shelf video object segmentation (VOS) models on PanoVOS. Through error analysis, we found that all of them fail to tackle pixel-level content discontinues of panoramic videos. Thus, we present a Panoramic Space Consistency Transformer (PSCFormer), which can effectively utilize the semantic boundary information of the previous frame for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Advanced Image and Video Retrieval Techniques · Video Surveillance and Tracking Methods
MethodsAttention Is All You Need · fail · Linear Layer · Multi-Head Attention · Byte Pair Encoding · Softmax · Dense Connections · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Residual Connection
