Hierarchical Memory Matching Network for Video Object Segmentation
Hongje Seong, Seoung Wug Oh, Joon-Young Lee, Seongwon Lee, Suhyeon, Lee, Euntai Kim

TL;DR
The paper introduces a Hierarchical Memory Matching Network (HMMN) for semi-supervised video object segmentation, utilizing multi-scale memory reading and temporal smoothness to improve accuracy and efficiency.
Contribution
It proposes novel memory read modules that enable multi-scale memory access and hierarchical matching, advancing the state-of-the-art in video object segmentation.
Findings
Achieved state-of-the-art results on DAVIS and YouTube-VOS datasets.
Introduced kernel guided memory matching for accurate retrieval.
Implemented hierarchical memory matching for detailed object masks.
Abstract
We present Hierarchical Memory Matching Network (HMMN) for semi-supervised video object segmentation. Based on a recent memory-based method [33], we propose two advanced memory read modules that enable us to perform memory reading in multiple scales while exploiting temporal smoothness. We first propose a kernel guided memory matching module that replaces the non-local dense memory read, commonly adopted in previous memory-based methods. The module imposes the temporal smoothness constraint in the memory read, leading to accurate memory retrieval. More importantly, we introduce a hierarchical memory matching scheme and propose a top-k guided memory matching module in which memory read on a fine-scale is guided by that on a coarse-scale. With the module, we perform memory read in multiple scales efficiently and leverage both high-level semantic and low-level fine-grained memory features…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Visual Attention and Saliency Detection · Advanced Image and Video Retrieval Techniques
