Voxel-Aggregated Feature Synthesis: Efficient Dense Mapping for Simulated 3D Reasoning
Owen Burns, Rizwan Qureshi

TL;DR
VAFS is a novel method that significantly accelerates dense 3D mapping in simulation by synthesizing features from segmented point clouds, reducing redundant computation and improving accuracy.
Contribution
The paper introduces VAFS, a new approach that reduces computational load in dense 3D mapping by synthesizing features, enabling faster and more accurate semantic mapping in simulation.
Findings
VAFS outperforms prior methods in speed and accuracy.
Semantic IoU scores are higher with VAFS.
VAFS reduces feature embedding from frames to objects.
Abstract
We address the issue of the exploding computational requirements of recent State-of-the-art (SOTA) open set multimodel 3D mapping (dense 3D mapping) algorithms and present Voxel-Aggregated Feature Synthesis (VAFS), a novel approach to dense 3D mapping in simulation. Dense 3D mapping involves segmenting and embedding sequential RGBD frames which are then fused into 3D. This leads to redundant computation as the differences between frames are small but all are individually segmented and embedded. This makes dense 3D mapping impractical for research involving embodied agents in which the environment, and thus the mapping, must be modified with regularity. VAFS drastically reduces this computation by using the segmented point cloud computed by a simulator's physics engine and synthesizing views of each region. This reduces the number of features to embed from the number of captured RGBD…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Processing and 3D Reconstruction · Advanced Image and Video Retrieval Techniques · Computer Graphics and Visualization Techniques
MethodsSparse Evolutionary Training · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
