Addressing Issues with Working Memory in Video Object Segmentation
Clayton Bromley, Alexander Moore, Amar Saini, Douglas Poland, Carmen, Carrano

TL;DR
This paper introduces a simple decision function to improve video object segmentation models by regulating working memory updates, enhancing robustness against sudden camera view changes and context shifts in real-world scenarios.
Contribution
A novel, easy-to-implement decision function that improves existing working memory-based VOS models by filtering irrelevant frames during abrupt scene changes.
Findings
Significant performance gains on videos with frame interjections.
Enhanced robustness to camera cuts and extreme context changes.
Applicable to any existing working memory-based VOS model.
Abstract
Contemporary state-of-the-art video object segmentation (VOS) models compare incoming unannotated images to a history of image-mask relations via affinity or cross-attention to predict object masks. We refer to the internal memory state of the initial image-mask pair and past image-masks as a working memory buffer. While the current state of the art models perform very well on clean video data, their reliance on a working memory of previous frames leaves room for error. Affinity-based algorithms include the inductive bias that there is temporal continuity between consecutive frames. To account for inconsistent camera views of the desired object, working memory models need an algorithmic modification that regulates the memory updates and avoid writing irrelevant frames into working memory. A simple algorithmic change is proposed that can be applied to any existing working memory-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection
MethodsVOS
