Representing Volumetric Videos as Dynamic MLP Maps
Sida Peng, Yunzhi Yan, Qing Shuai, Hujun Bao, Xiaowei Zhou

TL;DR
This paper presents a new method for real-time rendering of dynamic volumetric videos using shallow MLP networks stored in 2D grids, enabling fast and storage-efficient view synthesis of complex scenes.
Contribution
The paper introduces MLP maps, a novel representation that combines shallow MLPs with a shared 2D CNN decoder for efficient dynamic scene rendering.
Findings
Achieves state-of-the-art quality on NHR and ZJU-MoCap datasets.
Real-time rendering at 41.7 fps for 512x512 images on an RTX 3090.
Significantly reduces storage and improves speed compared to previous methods.
Abstract
This paper introduces a novel representation of volumetric videos for real-time view synthesis of dynamic scenes. Recent advances in neural scene representations demonstrate their remarkable capability to model and render complex static scenes, but extending them to represent dynamic scenes is not straightforward due to their slow rendering speed or high storage cost. To solve this problem, our key idea is to represent the radiance field of each frame as a set of shallow MLP networks whose parameters are stored in 2D grids, called MLP maps, and dynamically predicted by a 2D CNN decoder shared by all frames. Representing 3D scenes with shallow MLPs significantly improves the rendering speed, while dynamically predicting MLP parameters with a shared 2D CNN instead of explicitly storing them leads to low storage cost. Experiments show that the proposed approach achieves state-of-the-art…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Computer Graphics and Visualization Techniques · 3D Shape Modeling and Analysis
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
