Rethinking Image-to-3D Generation with Sparse Queries: Efficiency, Capacity, and Input-View Bias
Zhiyuan Xu, Jiuming Liu, Yuxin Chen, Masayoshi Tomizuka, Chenfeng Xu, Chensheng Peng

TL;DR
SparseGen introduces a fast, efficient image-to-3D generation framework using sparse learned queries, reducing input-view bias and memory usage while maintaining high fidelity.
Contribution
It proposes a novel sparse query-based scene modeling approach that outperforms dense methods in efficiency and bias reduction without requiring 3D supervision.
Findings
Significant reduction in memory and inference time.
Lower input-view bias compared to dense methods.
Maintains multi-view fidelity with sparse queries.
Abstract
We present SparseGen, a novel framework for efficient image-to-3D generation, which exhibits low input-view bias while being significantly faster. Unlike traditional approaches that rely on dense volumetric grids, triplanes, or pixel-aligned primitives, we model scenes with a compact sparse set of learned 3D anchor queries and a learned expansion operator that decodes each transformed query into a small local set of 3D Gaussian primitives. Trained under a rectified-flow reconstruction objective without 3D supervision, our model learns to allocate representation capacity where geometry and appearance matter, achieving significant reductions in memory and inference time while preserving multi-view fidelity. We introduce quantitative measures of input-view bias and utilization to show that sparse queries reduce overfitting to conditioning views while being representationally efficient. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
