TL;DR
3DTV is a real-time, feedforward neural network for view synthesis that combines lightweight geometry with learning, enabling efficient and robust multi-view rendering without scene-specific retraining.
Contribution
It introduces a novel feedforward network for sparse-view interpolation that leverages a pose-aware depth module and Delaunay-based triplet selection, avoiding scene-specific optimization.
Findings
Outperforms recent real-time view synthesis baselines in quality and efficiency.
Operates without scene-specific retraining, suitable for AR/VR and telepresence.
Demonstrates robustness across diverse multi-view video datasets.
Abstract
Real-time free-viewpoint rendering requires balancing multi-camera redundancy with the latency constraints of interactive applications. We address this challenge by combining lightweight geometry with learning and propose 3DTV, a feedforward network for real-time sparse-view interpolation. A Delaunay-based triplet selection ensures angular coverage for each target view. Building on this, we introduce a pose-aware depth module that estimates a coarse-to-fine depth pyramid, enabling efficient feature reprojection and occlusion-aware blending. Unlike methods that require scene-specific optimization, 3DTV runs feedforward without retraining, making it practical for AR/VR, telepresence, and interactive applications. Our experiments on challenging multi-view video datasets demonstrate that 3DTV consistently achieves a strong balance of quality and efficiency, outperforming recent real-time…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
