Elastic3D: Controllable Stereo Video Conversion with Guided Latent Decoding
Nando Metzger, Prune Truong, Goutam Bhat, Konrad Schindler, Federico Tombari

TL;DR
Elastic3D is a novel end-to-end method that converts monocular videos into high-quality, controllable stereo videos using guided latent diffusion, avoiding artifacts and providing user-adjustable stereo strength.
Contribution
It introduces a guided VAE decoder within a latent diffusion framework for artifact-free, controllable stereo video conversion from monocular videos.
Findings
Outperforms traditional warping-based methods.
Sets new standards in controllable stereo video quality.
Demonstrates effectiveness on real-world datasets.
Abstract
The growing demand for immersive 3D content calls for automated monocular-to-stereo video conversion. We present Elastic3D, a controllable, direct end-to-end method for upgrading a conventional video to a binocular one. Our approach, based on (conditional) latent diffusion, avoids artifacts due to explicit depth estimation and warping. The key to its high-quality stereo video output is a novel, guided VAE decoder that ensures sharp and epipolar-consistent stereo video output. Moreover, our method gives the user control over the strength of the stereo effect (more precisely, the disparity range) at inference time, via an intuitive, scalar tuning knob. Experiments on three different datasets of real-world stereo videos show that our method outperforms both traditional warping-based and recent warping-free baselines and sets a new standard for reliable, controllable stereo video…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Generative Adversarial Networks and Image Synthesis · Advanced Image Processing Techniques
