Fast Pixel-Matching for Video Object Segmentation
Siyue Yu, Jimin Xiao, BingFeng Zhang, Eng Gee Lim

TL;DR
This paper introduces NPMCA-net, a pixel-matching model for video object segmentation that balances high performance with fast inference speed by leveraging mask-propagation and non-local techniques, effectively handling appearance variations and occlusions.
Contribution
The paper presents NPMCA-net, a novel pixel-matching approach that combines reference and previous frame information for efficient and robust video object segmentation.
Findings
Achieves state-of-the-art IoU scores on DAVIS datasets.
Operates at 0.11 seconds per frame, enabling real-time performance.
Effectively handles large appearance variations and occlusions.
Abstract
Video object segmentation, aiming to segment the foreground objects given the annotation of the first frame, has been attracting increasing attentions. Many state-of-the-art approaches have achieved great performance by relying on online model updating or mask-propagation techniques. However, most online models require high computational cost due to model fine-tuning during inference. Most mask-propagation based models are faster but with relatively low performance due to failure to adapt to object appearance variation. In this paper, we are aiming to design a new model to make a good balance between speed and performance. We propose a model, called NPMCA-net, which directly localizes foreground objects based on mask-propagation and non-local technique by matching pixels in reference and target frames. Since we bring in information of both first and previous frames, our network is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Advanced Neural Network Applications · Advanced Image and Video Retrieval Techniques
