TL;DR
This paper introduces a graph memory network for video object segmentation that efficiently adapts to new videos and appearance changes by storing and updating frame representations through learnable controllers, improving performance on challenging benchmarks.
Contribution
The work presents a novel episodic graph memory network with learnable controllers for dynamic memory management, enabling effective one-shot and zero-shot video object segmentation.
Findings
Achieves superior results on four benchmark datasets.
Effectively adapts to appearance variations with limited visual information.
Demonstrates strong generalization in both one-shot and zero-shot settings.
Abstract
How to make a segmentation model efficiently adapt to a specific video and to online target appearance variations are fundamentally crucial issues in the field of video object segmentation. In this work, a graph memory network is developed to address the novel idea of "learning to update the segmentation model". Specifically, we exploit an episodic memory network, organized as a fully connected graph, to store frames as nodes and capture cross-frame correlations by edges. Further, learnable controllers are embedded to ease memory reading and writing, as well as maintain a fixed memory scale. The structured, external memory design enables our model to comprehensively mine and quickly store new knowledge, even with limited visual information, and the differentiable memory controllers slowly learn an abstract method for storing useful representations in the memory and how to later use…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsMemory Network
