TL;DR
CapsuleVOS introduces a capsule network-based method for semi-supervised video object segmentation that efficiently segments multiple frames simultaneously, effectively handling small objects and occlusions without relying on optical flow.
Contribution
The paper presents a novel capsule-based approach with a new routing algorithm, a zooming module for small objects, and a recurrent memory module for occlusion handling, advancing semi-supervised video segmentation.
Findings
Outperforms current offline methods on Youtube-VOS dataset
Runs nearly twice as fast as competing approaches
Effectively handles small objects and occlusions
Abstract
In this work we propose a capsule-based approach for semi-supervised video object segmentation. Current video object segmentation methods are frame-based and often require optical flow to capture temporal consistency across frames which can be difficult to compute. To this end, we propose a video based capsule network, CapsuleVOS, which can segment several frames at once conditioned on a reference frame and segmentation mask. This conditioning is performed through a novel routing algorithm for attention-based efficient capsule selection. We address two challenging issues in video object segmentation: 1) segmentation of small objects and 2) occlusion of objects across time. The issue of segmenting small objects is addressed with a zooming module which allows the network to process small spatial regions of the video. Apart from this, the framework utilizes a novel memory module based on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
