Learning to Learn Better for Video Object Segmentation
Meng Lan, Jing Zhang, Lefei Zhang, Dacheng Tao

TL;DR
This paper introduces a novel framework called LLB for semi-supervised video object segmentation, which improves target feature learning and fusion to enhance robustness and accuracy.
Contribution
The paper proposes a discriminative label generation module and an adaptive fusion module to better learn and combine target features in SVOS.
Findings
Achieves state-of-the-art performance on public benchmarks.
Improves target feature discrimination and fusion robustness.
Enhances segmentation accuracy and robustness against distractors.
Abstract
Recently, the joint learning framework (JOINT) integrates matching based transductive reasoning and online inductive learning to achieve accurate and robust semi-supervised video object segmentation (SVOS). However, using the mask embedding as the label to guide the generation of target features in the two branches may result in inadequate target representation and degrade the performance. Besides, how to reasonably fuse the target features in the two different branches rather than simply adding them together to avoid the adverse effect of one dominant branch has not been investigated. In this paper, we propose a novel framework that emphasizes Learning to Learn Better (LLB) target features for SVOS, termed LLB, where we design the discriminative label generation module (DLGM) and the adaptive fusion module to address these issues. Technically, the DLGM takes the background-filtered…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Visual Attention and Saliency Detection · Advanced Neural Network Applications
