Hybrid-S2S: Video Object Segmentation with Recurrent Networks and Correspondence Matching
Fatemeh Azimi, Stanislav Frolov, Federico Raue, Joern Hees, and Andreas Dengel

TL;DR
This paper introduces HS2S, a hybrid RNN and correspondence matching architecture for one-shot video object segmentation, significantly improving accuracy and robustness over previous RNN-based methods, especially in challenging scenarios.
Contribution
The paper proposes a novel hybrid sequence-to-sequence model combining RNNs with correspondence matching to address drift and error propagation in VOS.
Findings
Achieves 11.2 percentage points improvement on Youtube-VOS.
Reduces drift and error propagation in RNN-based VOS.
Enhances segmentation quality in occlusion and long sequence cases.
Abstract
One-shot Video Object Segmentation~(VOS) is the task of pixel-wise tracking an object of interest within a video sequence, where the segmentation mask of the first frame is given at inference time. In recent years, Recurrent Neural Networks~(RNNs) have been widely used for VOS tasks, but they often suffer from limitations such as drift and error propagation. In this work, we study an RNN-based architecture and address some of these issues by proposing a hybrid sequence-to-sequence architecture named HS2S, utilizing a dual mask propagation strategy that allows incorporating the information obtained from correspondence matching. Our experiments show that augmenting the RNN with correspondence matching is a highly effective solution to reduce the drift problem. The additional information helps the model to predict more accurate masks and makes it robust against error propagation. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Visual Attention and Saliency Detection · Video Surveillance and Tracking Methods
MethodsVOS
