Addressing Issues with Working Memory in Video Object Segmentation

Clayton Bromley; Alexander Moore; Amar Saini; Douglas Poland; Carmen; Carrano

arXiv:2410.22451·cs.CV·October 31, 2024

Addressing Issues with Working Memory in Video Object Segmentation

Clayton Bromley, Alexander Moore, Amar Saini, Douglas Poland, Carmen, Carrano

PDF

Open Access

TL;DR

This paper introduces a simple decision function to improve video object segmentation models by regulating working memory updates, enhancing robustness against sudden camera view changes and context shifts in real-world scenarios.

Contribution

A novel, easy-to-implement decision function that improves existing working memory-based VOS models by filtering irrelevant frames during abrupt scene changes.

Findings

01

Significant performance gains on videos with frame interjections.

02

Enhanced robustness to camera cuts and extreme context changes.

03

Applicable to any existing working memory-based VOS model.

Abstract

Contemporary state-of-the-art video object segmentation (VOS) models compare incoming unannotated images to a history of image-mask relations via affinity or cross-attention to predict object masks. We refer to the internal memory state of the initial image-mask pair and past image-masks as a working memory buffer. While the current state of the art models perform very well on clean video data, their reliance on a working memory of previous frames leaves room for error. Affinity-based algorithms include the inductive bias that there is temporal continuity between consecutive frames. To account for inconsistent camera views of the desired object, working memory models need an algorithmic modification that regulates the memory updates and avoid writing irrelevant frames into working memory. A simple algorithmic change is proposed that can be applied to any existing working memory-based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection

MethodsVOS