EPRecon: An Efficient Framework for Real-Time Panoptic 3D Reconstruction from Monocular Video
Zhen Zhou, Yunkai Ma, Junfeng Fan, Shaolin Zhang, Fengshui Jing, Min, Tan

TL;DR
EPRecon is a novel real-time framework that enhances panoptic 3D reconstruction from monocular videos by directly estimating scene depth and integrating detailed semantic features, outperforming existing methods in speed and accuracy.
Contribution
The paper introduces a lightweight module for direct scene depth estimation in 3D volume and combines voxel and image features for improved panoptic segmentation accuracy.
Findings
Outperforms state-of-the-art in reconstruction quality
Achieves real-time inference speed
Demonstrates effectiveness on ScanNetV2 dataset
Abstract
Panoptic 3D reconstruction from a monocular video is a fundamental perceptual task in robotic scene understanding. However, existing efforts suffer from inefficiency in terms of inference speed and accuracy, limiting their practical applicability. We present EPRecon, an efficient real-time panoptic 3D reconstruction framework. Current volumetric-based reconstruction methods usually utilize multi-view depth map fusion to obtain scene depth priors, which is time-consuming and poses challenges to real-time scene reconstruction. To address this issue, we propose a lightweight module to directly estimate scene depth priors in a 3D volume for reconstruction quality improvement by generating occupancy probabilities of all voxels. In addition, compared with existing panoptic segmentation methods, EPRecon extracts panoptic features from both voxel features and corresponding image features,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Optical Imaging Technologies · Image and Video Stabilization · Optical measurement and interference techniques
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
