PanoVOS: Bridging Non-panoramic and Panoramic Views with Transformer for   Video Segmentation

Shilin Yan; Xiaohao Xu; Renrui Zhang; Lingyi Hong; Wenchao Chen,; Wenqiang Zhang; Wei Zhang

arXiv:2309.12303·cs.CV·July 30, 2024·2 cites

PanoVOS: Bridging Non-panoramic and Panoramic Views with Transformer for Video Segmentation

Shilin Yan, Xiaohao Xu, Renrui Zhang, Lingyi Hong, Wenchao Chen,, Wenqiang Zhang, Wei Zhang

PDF

Open Access 1 Repo

TL;DR

This paper introduces PanoVOS, a new panoramic video dataset, and proposes PSCFormer, a transformer-based model that effectively handles pixel-level content discontinuities for improved panoramic video segmentation.

Contribution

The paper provides the first panoramic video dataset for segmentation and introduces PSCFormer, a transformer model that leverages semantic boundary information for better performance.

Findings

01

Existing models fail on panoramic videos due to content discontinuities.

02

PSCFormer outperforms previous state-of-the-art models in panoramic video segmentation.

03

PanoVOS dataset presents new challenges for panoramic video analysis.

Abstract

Panoramic videos contain richer spatial information and have attracted tremendous amounts of attention due to their exceptional experience in some fields such as autonomous driving and virtual reality. However, existing datasets for video segmentation only focus on conventional planar images. To address the challenge, in this paper, we present a panoramic video dataset, PanoVOS. The dataset provides 150 videos with high video resolutions and diverse motions. To quantify the domain gap between 2D planar videos and panoramic videos, we evaluate 15 off-the-shelf video object segmentation (VOS) models on PanoVOS. Through error analysis, we found that all of them fail to tackle pixel-level content discontinues of panoramic videos. Thus, we present a Panoramic Space Consistency Transformer (PSCFormer), which can effectively utilize the semantic boundary information of the previous frame for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shilinyan99/panovos
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Advanced Image and Video Retrieval Techniques · Video Surveillance and Tracking Methods

MethodsAttention Is All You Need · fail · Linear Layer · Multi-Head Attention · Byte Pair Encoding · Softmax · Dense Connections · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Residual Connection