DepthFlow: Exploiting Depth-Flow Structural Correlations for Unsupervised Video Object Segmentation
Suhwan Cho, Minhyeok Lee, Jungho Lee, Donghyeong Kim, Sangyoun Lee

TL;DR
DepthFlow introduces a novel method to generate synthetic optical flow from single images using depth estimation, significantly enhancing training data for unsupervised video object segmentation and achieving state-of-the-art results.
Contribution
The paper presents a new data synthesis technique that leverages depth maps to generate optical flow, improving training data diversity for VOS models.
Findings
Achieves state-of-the-art performance on VOS benchmarks.
Enables large-scale training with synthetic data.
Improves model robustness by structural flow cues.
Abstract
Unsupervised video object segmentation (VOS) aims to detect the most prominent object in a video. Recently, two-stream approaches that leverage both RGB images and optical flow have gained significant attention, but their performance is fundamentally constrained by the scarcity of training data. To address this, we propose DepthFlow, a novel data generation method that synthesizes optical flow from single images. Our approach is driven by the key insight that VOS models depend more on structural information embedded in flow maps than on their geometric accuracy, and that this structure is highly correlated with depth. We first estimate a depth map from a source image and then convert it into a synthetic flow field that preserves essential structural cues. This process enables the transformation of large-scale image-mask pairs into image-flow-mask training pairs, dramatically expanding…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
