DepthFlow: Exploiting Depth-Flow Structural Correlations for Unsupervised Video Object Segmentation

Suhwan Cho; Minhyeok Lee; Jungho Lee; Donghyeong Kim; Sangyoun Lee

arXiv:2507.19790·cs.CV·July 29, 2025

DepthFlow: Exploiting Depth-Flow Structural Correlations for Unsupervised Video Object Segmentation

Suhwan Cho, Minhyeok Lee, Jungho Lee, Donghyeong Kim, Sangyoun Lee

PDF

TL;DR

DepthFlow introduces a novel method to generate synthetic optical flow from single images using depth estimation, significantly enhancing training data for unsupervised video object segmentation and achieving state-of-the-art results.

Contribution

The paper presents a new data synthesis technique that leverages depth maps to generate optical flow, improving training data diversity for VOS models.

Findings

01

Achieves state-of-the-art performance on VOS benchmarks.

02

Enables large-scale training with synthetic data.

03

Improves model robustness by structural flow cues.

Abstract

Unsupervised video object segmentation (VOS) aims to detect the most prominent object in a video. Recently, two-stream approaches that leverage both RGB images and optical flow have gained significant attention, but their performance is fundamentally constrained by the scarcity of training data. To address this, we propose DepthFlow, a novel data generation method that synthesizes optical flow from single images. Our approach is driven by the key insight that VOS models depend more on structural information embedded in flow maps than on their geometric accuracy, and that this structure is highly correlated with depth. We first estimate a depth map from a source image and then convert it into a synthetic flow field that preserves essential structural cues. This process enables the transformation of large-scale image-mask pairs into image-flow-mask training pairs, dramatically expanding…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.