TL;DR
MV-SAM3D is a training-free framework that enhances multi-view 3D scene reconstruction by ensuring consistency, physical plausibility, and adaptive fusion of observations, significantly improving quality and realism.
Contribution
It introduces a novel multi-view fusion method with adaptive weighting and physics-aware optimization, enabling high-quality, physically plausible 3D scene generation without additional training.
Findings
Improves reconstruction fidelity on standard benchmarks.
Enhances layout plausibility with physics-aware object arrangements.
Achieves significant quality improvements without extra training.
Abstract
Recent unified 3D generation models have made remarkable progress in producing high-quality 3D assets from a single image. Notably, layout-aware approaches such as SAM3D can reconstruct multiple objects while preserving their spatial arrangement, opening the door to practical scene-level 3D generation. However, current methods are limited to single-view input and cannot leverage complementary multi-view observations, while independently estimated object poses often lead to physically implausible layouts such as interpenetration and floating artifacts. We present MV-SAM3D, a training-free framework that extends layout-aware 3D generation with multi-view consistency and physical plausibility. We formulate multi-view fusion as a Multi-Diffusion process in 3D latent space and propose two adaptive weighting strategies -- attention-entropy weighting and visibility weighting -- that enable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
