Adaptive Memory Management for Video Object Segmentation
Ali Pourganjalikhan, Charalambos Poullis

TL;DR
This paper introduces an adaptive memory management strategy for video object segmentation that maintains high accuracy while significantly improving inference speed by discarding obsolete features based on their importance.
Contribution
It proposes a novel adaptive memory bank approach that dynamically manages stored features, enabling efficient segmentation of videos of arbitrary length without sacrificing performance.
Findings
Outperforms fixed-sized memory strategies on DAVIS and Youtube-VOS datasets.
Increases inference speed by up to 80%.
Achieves comparable accuracy to growing memory banks.
Abstract
Matching-based networks have achieved state-of-the-art performance for video object segmentation (VOS) tasks by storing every-k frames in an external memory bank for future inference. Storing the intermediate frames' predictions provides the network with richer cues for segmenting an object in the current frame. However, the size of the memory bank gradually increases with the length of the video, which slows down inference speed and makes it impractical to handle arbitrary length videos. This paper proposes an adaptive memory bank strategy for matching-based networks for semi-supervised video object segmentation (VOS) that can handle videos of arbitrary length by discarding obsolete features. Features are indexed based on their importance in the segmentation of the objects in previous frames. Based on the index, we discard unimportant features to accommodate new features. We present…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Advanced Neural Network Applications · Advanced Image and Video Retrieval Techniques
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
