YoNoSplat: You Only Need One Model for Feedforward 3D Gaussian Splatting
Botao Ye, Boqi Chen, Haofei Xu, Daniel Barath, and Marc Pollefeys

TL;DR
YoNoSplat is a versatile, fast feedforward model capable of reconstructing high-quality 3D Gaussian Splatting representations from various image inputs, including unposed and uncalibrated images, using a novel training strategy.
Contribution
It introduces a novel mixing training strategy and camera normalization techniques, enabling a single model to handle pose-free and pose-dependent 3D scene reconstruction tasks.
Findings
Reconstructs scenes from 100 views in 2.69 seconds on GPU.
Achieves state-of-the-art results on standard benchmarks.
Operates effectively with both posed and unposed, calibrated and uncalibrated inputs.
Abstract
Fast and flexible 3D scene reconstruction from unstructured image collections remains a significant challenge. We present YoNoSplat, a feedforward model that reconstructs high-quality 3D Gaussian Splatting representations from an arbitrary number of images. Our model is highly versatile, operating effectively with both posed and unposed, calibrated and uncalibrated inputs. YoNoSplat predicts local Gaussians and camera poses for each view, which are aggregated into a global representation using either predicted or provided poses. To overcome the inherent difficulty of jointly learning 3D Gaussians and camera parameters, we introduce a novel mixing training strategy. This approach mitigates the entanglement between the two tasks by initially using ground-truth poses to aggregate local Gaussians and gradually transitioning to a mix of predicted and ground-truth poses, which prevents both…
Peer Reviews
Decision·ICLR 2026 Poster
1. **Solid experimental coverage.** Results span multiple priors (p/k/none), multiple view counts (6/12/24; 32/64/128), and include cross-dataset tests (DL3DV→ScanNet++). The pose AUC comparisons further substantiate the quality of the predicted geometry. 2. **Methodical ablations.** The paper dissects (i) output space (local vs canonical), (ii) training regime (mix/self/teacher), (iii) normalization choices, (iv) ICE usefulness, and (v) Plücker rays, providing good insight into *why* choices ma
1. The proposed method predicts per-pixel Gaussians, which may become computationally inefficient for large-scale scenes (with large number of input images). While the paper introduces opacity regularization and Gaussian pruning to mitigate this, it does not quantify how much these steps actually reduce the number of Gaussians or memory footprint. A comparative analysis with AnySplat in terms of Gaussian count and memory efficiency would strengthen the claims about scalability. 2. ICE train–test
1. The paper is clearly written and easy to follow. 2. The paper's primary strength is the model's versatility. It is designed to handle a wide, practical range of input conditions: an arbitrary number of views, both posed and unposed, and both calibrated and uncalibrated images. 3. The model demonstrates state-of-the-art performance across multiple standard benchmarks.
Reliance on Post-Optimization Undermines the "Feedforward" Claim. The paper presents itself as a feedforward model, but its strongest results (e.g., in Table 1) rely on an "Optional Post-Optimization" step. This optimization is not feedforward and adds a significant time cost (e.g., 165s for 24 views). The large performance gap between the feedforward-only output and the optimized output suggests that the feedforward prediction is, by itself, substantially suboptimal. This weakens the central cl
- YoNoSplat is able to process images with or without camera pose/intrinsic information, where incorporating additional information of pose/intrinsics leads to improved performance, making it suitable for both in-the-wild settings and industry level deployment. - The paper reveals multiple recipes such as scale normalization, mix-forcing, local estimation which is valuable for the research community for future development. - YoNoSplat largely outperforms previous approaches, setting the new stat
- **Large computation and training time:** In Section 4.1, it is explained that YoNoSplat requires 16 GH200 GPUs for 150k steps, where the required computation seems to be extremely large. Adding additional comparison of the required computation with prior works would be nice to better understand the contributions of YoNoSplat. - **Architectural Novelty:** The proposed architecture seems to be a direct extension of NoPoSplat with $\pi^3$, which makes the architectural novelty of the work a bit l
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Advanced Image Processing Techniques · Optical measurement and interference techniques
