Tunnel Try-on: Excavating Spatial-temporal Tunnels for High-quality Virtual Try-on in Videos
Zhengze Xu, Mengting Chen, Zhao Wang, Linyu Xing, Zhonghua Zhai, Nong, Sang, Jinsong Lan, Shuai Xiao, Changxin Gao

TL;DR
Tunnel Try-on introduces a diffusion-based framework that focuses on excavating spatial-temporal tunnels in videos to enhance detail preservation and motion coherence for high-quality virtual try-on applications.
Contribution
The paper proposes a novel focus tunnel approach combined with Kalman filtering and environment encoding to improve video virtual try-on quality.
Findings
Preserves clothing details effectively.
Ensures smooth and coherent motion in generated videos.
Achieves significant progress towards commercial virtual try-on applications.
Abstract
Video try-on is a challenging task and has not been well tackled in previous works. The main obstacle lies in preserving the details of the clothing and modeling the coherent motions simultaneously. Faced with those difficulties, we address video try-on by proposing a diffusion-based framework named "Tunnel Try-on." The core idea is excavating a "focus tunnel" in the input video that gives close-up shots around the clothing regions. We zoom in on the region in the tunnel to better preserve the fine details of the clothing. To generate coherent motions, we first leverage the Kalman filter to construct smooth crops in the focus tunnel and inject the position embedding of the tunnel into attention layers to improve the continuity of the generated videos. In addition, we develop an environment encoder to extract the context information outside the tunnels as supplementary cues. Equipped…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimedia Communication and Technology · Image and Video Quality Assessment · Video Coding and Compression Technologies
MethodsFocus
