Temporally stable video segmentation without video annotations
Aharon Azulay, Tavi Halperin, Orestis Vantzos, Nadav Borenstein, Ofir, Bibi

TL;DR
This paper presents an unsupervised method to adapt still image segmentation models for stable, temporally consistent video segmentation by leveraging optical flow and a consistency measure validated against human judgment.
Contribution
It introduces a novel unsupervised approach combining optical flow-based consistency with a multi-input decoder to improve video segmentation stability without video annotations.
Findings
Enhanced temporal stability in video segmentation results
Minimal loss of accuracy compared to image-based models
Validated consistency measure correlates well with human judgment
Abstract
Temporally consistent dense video annotations are scarce and hard to collect. In contrast, image segmentation datasets (and pre-trained models) are ubiquitous, and easier to label for any novel task. In this paper, we introduce a method to adapt still image segmentation models to video in an unsupervised manner, by using an optical flow-based consistency measure. To ensure that the inferred segmented videos appear more stable in practice, we verify that the consistency measure is well correlated with human judgement via a user study. Training a new multi-input multi-output decoder using this measure as a loss, together with a technique for refining current image segmentation datasets and a temporal weighted-guided filter, we observe stability improvements in the generated segmented videos with minimal loss of accuracy.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Temporally stable video segmentation without video annotations· youtube
Taxonomy
TopicsVisual Attention and Saliency Detection · Advanced Image and Video Retrieval Techniques · Generative Adversarial Networks and Image Synthesis
