Im4D: High-Fidelity and Real-Time Novel View Synthesis for Dynamic   Scenes

Haotong Lin; Sida Peng; Zhen Xu; Tao Xie; Xingyi He; Hujun Bao,; Xiaowei Zhou

arXiv:2310.08585·cs.CV·October 13, 2023·2 cites

Im4D: High-Fidelity and Real-Time Novel View Synthesis for Dynamic Scenes

Haotong Lin, Sida Peng, Zhen Xu, Tao Xie, Xingyi He, Hujun Bao,, Xiaowei Zhou

PDF

Open Access 1 Repo

TL;DR

Im4D introduces a hybrid scene representation combining grid-based geometry and multi-view image-based appearance to achieve high-fidelity, real-time dynamic scene rendering, outperforming previous methods in quality and efficiency.

Contribution

The paper presents Im4D, a novel hybrid scene representation that effectively combines geometry and appearance modeling for dynamic view synthesis.

Findings

01

State-of-the-art rendering quality on five datasets.

02

Real-time rendering at 79.8 FPS for 512x512 images.

03

Efficient training on a single GPU.

Abstract

This paper aims to tackle the challenge of dynamic view synthesis from multi-view videos. The key observation is that while previous grid-based methods offer consistent rendering, they fall short in capturing appearance details of a complex dynamic scene, a domain where multi-view image-based rendering methods demonstrate the opposite properties. To combine the best of two worlds, we introduce Im4D, a hybrid scene representation that consists of a grid-based geometry representation and a multi-view image-based appearance representation. Specifically, the dynamic geometry is encoded as a 4D density function composed of spatiotemporal feature planes and a small MLP network, which globally models the scene structure and facilitates the rendering consistency. We represent the scene appearance by the original multi-view videos and a network that learns to predict the color of a 3D point from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zju3dv/im4d
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Generative Adversarial Networks and Image Synthesis · Computer Graphics and Visualization Techniques

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings