TransVDM: Motion-Constrained Video Diffusion Model for Transparent Video Synthesis
Menghao Li, Zhenghao Zhang, Junchao Liao, Long Qin, Weizhi Wang

TL;DR
TransVDM is a novel diffusion-based model that generates transparent videos by integrating a specialized autoencoder, a motion constraint module, and a large transparent video dataset, advancing the capabilities of video synthesis.
Contribution
We introduce TransVDM, the first diffusion model tailored for transparent video generation, combining a new autoencoder, motion constraints, and a large transparent video dataset.
Findings
TransVDM effectively generates high-quality transparent videos.
The model reduces artifacts in transparent regions.
Experimental results outperform existing methods on benchmarks.
Abstract
Recent developments in Video Diffusion Models (VDMs) have demonstrated remarkable capability to generate high-quality video content. Nonetheless, the potential of VDMs for creating transparent videos remains largely uncharted. In this paper, we introduce TransVDM, the first diffusion-based model specifically designed for transparent video generation. TransVDM integrates a Transparent Variational Autoencoder (TVAE) and a pretrained UNet-based VDM, along with a novel Alpha Motion Constraint Module (AMCM). The TVAE captures the alpha channel transparency of video frames and encodes it into the latent space of the VDMs, facilitating a seamless transition to transparent video diffusion models. To improve the detection of transparent areas, the AMCM integrates motion constraints from the foreground within the VDM, helping to reduce undesirable artifacts. Moreover, we curate a dataset…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Video Coding and Compression Technologies · Image and Video Quality Assessment
