TransFlow: Motion Knowledge Transfer from Video Diffusion Models to Video Salient Object Detection

Suhwan Cho; Minhyeok Lee; Jungho Lee; Sunghun Yang; Sangyoun Lee

arXiv:2507.19789·cs.CV·July 29, 2025

TransFlow: Motion Knowledge Transfer from Video Diffusion Models to Video Salient Object Detection

Suhwan Cho, Minhyeok Lee, Jungho Lee, Sunghun Yang, Sangyoun Lee

PDF

TL;DR

TransFlow introduces a novel approach to enhance video salient object detection by transferring motion knowledge from pre-trained video diffusion models to generate realistic training data, improving detection accuracy.

Contribution

The paper proposes TransFlow, a method that leverages pre-trained video diffusion models to generate semantically-aware optical flows from static images for better video SOD training.

Findings

01

Improved performance on multiple video SOD benchmarks.

02

Effective transfer of semantic motion priors from diffusion models.

03

Generation of realistic optical flows preserving spatial and temporal coherence.

Abstract

Video salient object detection (SOD) relies on motion cues to distinguish salient objects from backgrounds, but training such models is limited by scarce video datasets compared to abundant image datasets. Existing approaches that use spatial transformations to create video sequences from static images fail for motion-guided tasks, as these transformations produce unrealistic optical flows that lack semantic understanding of motion. We present TransFlow, which transfers motion knowledge from pre-trained video diffusion models to generate realistic training data for video SOD. Video diffusion models have learned rich semantic motion priors from large-scale video data, understanding how different objects naturally move in real scenes. TransFlow leverages this knowledge to generate semantically-aware optical flows from static images, where objects exhibit natural motion patterns while…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.