TL;DR
Fus3D introduces a fast, feed-forward method for dense 3D geometry reconstruction from unstructured images, leveraging transformer features for accurate Signed Distance Field regression without calibration or post-processing.
Contribution
The paper presents a novel volumetric extraction technique directly from geometry transformer features, improving 3D reconstruction accuracy and efficiency over existing pipelines.
Findings
Achieves dense SDF regression in under three seconds.
Produces complete, plausible 3D geometries from sparse and dense views.
Uses a scalable supervision scheme with depth maps or 3D assets.
Abstract
We propose a feed-forward method for dense Signed Distance Field (SDF) regression from unstructured image collections in less than three seconds, without camera calibration or post-hoc fusion. Our key insight is that the intermediate feature space of pretrained multi-view feed-forward geometry transformers already encodes a powerful joint world representation; yet, existing pipelines discard it, routing features through per-view prediction heads before assembling 3D geometry post-hoc, which discards valuable completeness information and accumulates inaccuracies. We instead perform 3D extraction directly from geometry transformer features via learned volumetric extraction: voxelized canonical embeddings that progressively absorb multi-view geometry information through interleaved cross- and self-attention into a structured volumetric latent grid. A simple convolutional decoder then…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
