ImmersePro: End-to-End Stereo Video Synthesis Via Implicit Disparity Learning
Jian Shi, Zhenyu Li, Peter Wonka

TL;DR
ImmersePro is a novel end-to-end framework that converts single-view videos into stereo videos using implicit disparity learning, leveraging a new dataset and attention mechanisms for high-quality stereo synthesis.
Contribution
The paper introduces ImmersePro, a dual-branch architecture with implicit disparity guidance and a large-scale stereo video dataset for improved stereo video generation.
Findings
Significant quantitative improvements over existing methods.
Effective stereo video synthesis from monocular videos.
Introduction of the large-scale YouTube-SBS dataset.
Abstract
We introduce \textit{ImmersePro}, an innovative framework specifically designed to transform single-view videos into stereo videos. This framework utilizes a novel dual-branch architecture comprising a disparity branch and a context branch on video data by leveraging spatial-temporal attention mechanisms. \textit{ImmersePro} employs implicit disparity guidance, enabling the generation of stereo pairs from video sequences without the need for explicit disparity maps, thus reducing potential errors associated with disparity estimation models. In addition to the technical advancements, we introduce the YouTube-SBS dataset, a comprehensive collection of 423 stereo videos sourced from YouTube. This dataset is unprecedented in its scale, featuring over 7 million stereo pairs, and is designed to facilitate training and benchmarking of stereo video generation models. Our experiments demonstrate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Video Coding and Compression Technologies
MethodsSoftmax · Attention Is All You Need
