ROAR-3D: Routing Arbitrary Views for High-Fidelity 3D Generation
Hanxiao Sun, Mingxin Yang, Shuhui Yang, Zebin He, Xintong Han, Hongbo Fu, Chunchao Guo, Wenhan Luo

TL;DR
ROAR-3D is a lightweight method that enhances pretrained single-view 3D generative models to accept multiple unposed views, improving multi-view 3D generation quality without heavy training costs.
Contribution
It introduces a novel view routing mechanism and dual-stream attention to enable multi-view conditioning in pretrained models with minimal additional training.
Findings
Achieves state-of-the-art multi-view 3D generation quality.
Supports test-time view scaling from 1 to 12+ views.
Maintains high fidelity with minimal training overhead.
Abstract
Single-image-to-3D generative models can now produce high-quality geometry, yet conditioning on a single view inevitably introduces ambiguity about unseen regions. Multi-view conditioning can reduce this ambiguity, but existing methods either require fixed canonical viewpoints or rely on external reconstruction modules that impose heavy training costs and limit generation quality. We observe that pretrained single-view models already possess strong 2D-to-3D grounding that can be reused for multi-view conditioning. However, a closer analysis reveals that their conditioning mechanism entangles orientation control with geometry transfer, two functions that conflict when images from different viewpoints are naively combined. Based on this analysis, we propose ROAR-3D, a lightweight method that upgrades a pretrained single-view model to accept an arbitrary number of unposed images. A…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
