StereoPilot: Learning Unified and Efficient Stereo Conversion via Generative Priors

Guibao Shen; Yihua Du; Wenhang Ge; Jing He; Chirui Chang; Donghao Zhou; Zhen Yang; Luozhou Wang; Xin Tao; Ying-Cong Chen

arXiv:2512.16915·cs.CV·December 19, 2025

StereoPilot: Learning Unified and Efficient Stereo Conversion via Generative Priors

Guibao Shen, Yihua Du, Wenhang Ge, Jing He, Chirui Chang, Donghao Zhou, Zhen Yang, Luozhou Wang, Xin Tao, Ying-Cong Chen

PDF

Open Access 1 Models

TL;DR

StereoPilot is a novel, efficient model for high-quality stereo video conversion that leverages a large-scale dataset and adapts seamlessly to different stereo formats, outperforming existing methods in fidelity and efficiency.

Contribution

We introduce UniStereo, a large-scale dataset for stereo video conversion, and propose StereoPilot, a unified model that directly synthesizes stereo views without explicit depth, improving accuracy and efficiency.

Findings

01

StereoPilot outperforms state-of-the-art methods in visual quality.

02

StereoPilot achieves higher computational efficiency.

03

The unified dataset enables robust training and benchmarking.

Abstract

The rapid growth of stereoscopic displays, including VR headsets and 3D cinemas, has led to increasing demand for high-quality stereo video content. However, producing 3D videos remains costly and complex, while automatic Monocular-to-Stereo conversion is hindered by the limitations of the multi-stage ``Depth-Warp-Inpaint'' (DWI) pipeline. This paradigm suffers from error propagation, depth ambiguity, and format inconsistency between parallel and converged stereo configurations. To address these challenges, we introduce UniStereo, the first large-scale unified dataset for stereo video conversion, covering both stereo formats to enable fair benchmarking and robust model training. Building upon this dataset, we propose StereoPilot, an efficient feed-forward model that directly synthesizes the target view without relying on explicit depth maps or iterative diffusion sampling. Equipped with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
KlingTeam/StereoPilot
model· ♡ 16
♡ 16

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Generative Adversarial Networks and Image Synthesis · Advanced Image Processing Techniques