Im4D: High-Fidelity and Real-Time Novel View Synthesis for Dynamic Scenes
Haotong Lin, Sida Peng, Zhen Xu, Tao Xie, Xingyi He, Hujun Bao,, Xiaowei Zhou

TL;DR
Im4D introduces a hybrid scene representation combining grid-based geometry and multi-view image-based appearance to achieve high-fidelity, real-time dynamic scene rendering, outperforming previous methods in quality and efficiency.
Contribution
The paper presents Im4D, a novel hybrid scene representation that effectively combines geometry and appearance modeling for dynamic view synthesis.
Findings
State-of-the-art rendering quality on five datasets.
Real-time rendering at 79.8 FPS for 512x512 images.
Efficient training on a single GPU.
Abstract
This paper aims to tackle the challenge of dynamic view synthesis from multi-view videos. The key observation is that while previous grid-based methods offer consistent rendering, they fall short in capturing appearance details of a complex dynamic scene, a domain where multi-view image-based rendering methods demonstrate the opposite properties. To combine the best of two worlds, we introduce Im4D, a hybrid scene representation that consists of a grid-based geometry representation and a multi-view image-based appearance representation. Specifically, the dynamic geometry is encoded as a 4D density function composed of spatiotemporal feature planes and a small MLP network, which globally models the scene structure and facilitates the rendering consistency. We represent the scene appearance by the original multi-view videos and a network that learns to predict the color of a 3D point from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Generative Adversarial Networks and Image Synthesis · Computer Graphics and Visualization Techniques
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
