TL;DR
LiveStre4m is a real-time feed-forward model for novel view synthesis from unposed multi-view video, enabling stable, temporally consistent streaming without camera calibration, suitable for practical live applications.
Contribution
It introduces a novel feed-forward approach with a multi-view transformer, diffusion interpolation, and pose prediction, enabling real-time, uncalibrated, unposed multi-view video streaming.
Findings
Achieves 0.07s per-frame reconstruction at 1024x768 resolution.
Operates with as few as two synchronized unposed input streams.
Outperforms optimization-based methods in runtime by orders of magnitude.
Abstract
Live-streaming Novel View Synthesis (NVS) from unposed multi-view video remains an open challenge in a wide range of applications. Existing methods for dynamic scene representation typically require ground-truth camera parameters and involve lengthy optimizations (s), which makes them unsuitable for live streaming scenarios. To address this issue, we propose a novel viewpoint video live-streaming method (LiveStre4m), a feed-forward model for real-time NVS from unposed sparse multi-view inputs. LiveStre4m introduces a multi-view vision transformer for keyframe 3D scene reconstruction coupled with a diffusion-transformer interpolation module that ensures temporal consistency and stable streaming. In addition, a Camera Pose Predictor module is proposed to efficiently estimate both poses and intrinsics directly from RGB images, removing the reliance on known camera calibration…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
