StereoWorld: Geometry-Aware Monocular-to-Stereo Video Generation

Ke Xing; Xiaojie Jin; Longfei Li; Yuyang Yin; Hanwen Liang; Guixun Luo; Chen Fang; Jue Wang; Konstantinos N. Plataniotis; Yao Zhao; Yunchao Wei

arXiv:2512.09363·cs.CV·December 12, 2025

StereoWorld: Geometry-Aware Monocular-to-Stereo Video Generation

Ke Xing, Xiaojie Jin, Longfei Li, Yuyang Yin, Hanwen Liang, Guixun Luo, Chen Fang, Jue Wang, Konstantinos N. Plataniotis, Yao Zhao, Yunchao Wei

PDF

Open Access

TL;DR

StereoWorld is a novel framework that converts monocular videos into high-quality stereo videos by leveraging geometry-aware regularization and a large HD stereo dataset, significantly improving visual fidelity and 3D structural accuracy.

Contribution

It introduces an end-to-end monocular-to-stereo video generation method with geometry-aware supervision and a new high-definition stereo dataset for training and evaluation.

Findings

01

Outperforms prior methods in visual fidelity

02

Ensures geometric consistency in generated stereo videos

03

Supports high-resolution synthesis with efficient tiling

Abstract

The growing adoption of XR devices has fueled strong demand for high-quality stereo video, yet its production remains costly and artifact-prone. To address this challenge, we present StereoWorld, an end-to-end framework that repurposes a pretrained video generator for high-fidelity monocular-to-stereo video generation. Our framework jointly conditions the model on the monocular video input while explicitly supervising the generation with a geometry-aware regularization to ensure 3D structural fidelity. A spatio-temporal tiling scheme is further integrated to enable efficient, high-resolution synthesis. To enable large-scale training and evaluation, we curate a high-definition stereo video dataset containing over 11M frames aligned to natural human interpupillary distance (IPD). Extensive experiments demonstrate that StereoWorld substantially outperforms prior methods, generating stereo…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Generative Adversarial Networks and Image Synthesis · 3D Shape Modeling and Analysis