Learning Stochastic Bridges for Video Object Removal via Video-to-Video Translation

Zijie Lou; Xiangwei Feng; Jiaxin Wang; Jiangtao Yao; Fei Che; Tianbao Liu; Chengjing Wu; Xiaochao Qu; Luoqi Liu; Ting Liu

arXiv:2601.12066·cs.CV·January 30, 2026

Learning Stochastic Bridges for Video Object Removal via Video-to-Video Translation

Zijie Lou, Xiangwei Feng, Jiaxin Wang, Jiangtao Yao, Fei Che, Tianbao Liu, Chengjing Wu, Xiaochao Qu, Luoqi Liu, Ting Liu

PDF

Open Access

TL;DR

This paper introduces a novel video object removal method using a stochastic bridge model that directly translates input videos to clean versions, leveraging structural priors for more accurate and consistent removal.

Contribution

The paper proposes a stochastic bridge framework for video object removal that directly connects source and target videos, improving guidance and scene consistency over diffusion-based methods.

Findings

01

Outperforms existing methods in visual quality

02

Achieves superior temporal consistency

03

Effectively handles large object removal

Abstract

Existing video object removal methods predominantly rely on diffusion models following a noise-to-data paradigm, where generation starts from uninformative Gaussian noise. This approach discards the rich structural and contextual priors present in the original input video. Consequently, such methods often lack sufficient guidance, leading to incomplete object erasure or the synthesis of implausible content that conflicts with the scene's physical logic. In this paper, we reformulate video object removal as a video-to-video translation task via a stochastic bridge model. Unlike noise-initialized methods, our framework establishes a direct stochastic path from the source video (with objects) to the target video (objects removed). This bridge formulation effectively leverages the input video as a strong structural prior, guiding the model to perform precise removal while ensuring that the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Image Enhancement Techniques · Advanced Image Processing Techniques