TL;DR
AnyRecon is a scalable 3D reconstruction framework that uses a video diffusion model with explicit geometric control, global scene memory, and efficient attention to handle sparse, unordered inputs and large viewpoint changes.
Contribution
It introduces a novel geometry-aware conditioning strategy and a scalable diffusion-based approach for arbitrary-view 3D reconstruction from sparse, unordered inputs.
Findings
Robust reconstruction across irregular inputs and large viewpoint gaps.
Supports scalable and explicit geometric control in 3D scene modeling.
Efficient diffusion distillation with sparse attention reduces computational complexity.
Abstract
Sparse-view 3D reconstruction is essential for modeling scenes from casual captures, but remain challenging for non-generative reconstruction. Existing diffusion-based approaches mitigates this issues by synthesizing novel views, but they often condition on only one or two capture frames, which restricts geometric consistency and limits scalability to large or diverse scenes. We propose AnyRecon, a scalable framework for reconstruction from arbitrary and unordered sparse inputs that preserves explicit geometric control while supporting flexible conditioning cardinality. To support long-range conditioning, our method constructs a persistent global scene memory via a prepended capture view cache, and removes temporal compression to maintain frame-level correspondence under large viewpoint changes. Beyond better generative model, we also find that the interplay between generation and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
