Boosting Video Object Segmentation based on Scale Inconsistency
Hengyi Wang, Changjae Oh

TL;DR
This paper introduces a scale inconsistency-based refinement framework that enhances semi-supervised video object segmentation models by leveraging input size variations for improved accuracy and robustness.
Contribution
It proposes a novel pixel-level attention module and self-supervised online adaptation method utilizing scale inconsistency for better VOS performance.
Findings
Improves VOS accuracy on DAVIS datasets
Applicable to various existing VOS models
Enhances robustness through scale-based regularization
Abstract
We present a refinement framework to boost the performance of pre-trained semi-supervised video object segmentation (VOS) models. Our work is based on scale inconsistency, which is motivated by the observation that existing VOS models generate inconsistent predictions from input frames with different sizes. We use the scale inconsistency as a clue to devise a pixel-level attention module that aggregates the advantages of the predictions from different-size inputs. The scale inconsistency is also used to regularize the training based on a pixel-level variance measured by an uncertainty estimation. We further present a self-supervised online adaptation, tailored for test-time optimization, that bootstraps the predictions without ground-truth masks based on the scale inconsistency. Experiments on DAVIS 16 and DAVIS 17 datasets show that our framework can be generically applied to various…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Visual Attention and Saliency Detection · Domain Adaptation and Few-Shot Learning
MethodsVOS
