SiamPolar: Semi-supervised Realtime Video Object Segmentation with Polar   Representation

Yaochen Li; Yuhui Hong; Yonghong Song; Chao Zhu; Ying Zhang; Ruihao; Wang

arXiv:2110.14773·cs.CV·October 29, 2021

SiamPolar: Semi-supervised Realtime Video Object Segmentation with Polar Representation

Yaochen Li, Yuhui Hong, Yonghong Song, Chao Zhu, Ying Zhang, Ruihao, Wang

PDF

TL;DR

SiamPolar introduces a semi-supervised, real-time video object segmentation method using a novel polar representation and asymmetric Siamese network to improve speed and efficiency for autonomous vehicle applications.

Contribution

The paper presents a new polar representation and an asymmetric Siamese network architecture for faster, semi-supervised video object segmentation in real-time.

Findings

01

Achieves real-time performance on DAVIS-2016 dataset

02

Reduces parameters for mask encoding with minimal accuracy loss

03

Demonstrates effectiveness on multiple public datasets

Abstract

Video object segmentation (VOS) is an essential part of autonomous vehicle navigation. The real-time speed is very important for the autonomous vehicle algorithms along with the accuracy metric. In this paper, we propose a semi-supervised real-time method based on the Siamese network using a new polar representation. The input of bounding boxes is initialized rather than the object masks, which are applied to the video object detection tasks. The polar representation could reduce the parameters for encoding masks with subtle accuracy loss so that the algorithm speed can be improved significantly. An asymmetric siamese network is also developed to extract the features from different spatial scales. Moreover, the peeling convolution is proposed to reduce the antagonism among the branches of the polar head. The repeated cross-correlation and semi-FPN are designed based on this idea. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Siamese Network · Convolution