Long-LRM++: Preserving Fine Details in Feed-Forward Wide-Coverage Reconstruction
Chen Ziwen, Hao Tan, Peng Wang, Zexiang Xu, Li Fuxin

TL;DR
Long-LRM++ introduces a semi-explicit scene representation with a lightweight decoder, enabling real-time high-fidelity scene reconstruction from multiple views, surpassing prior methods in speed and quality.
Contribution
It combines implicit and explicit scene representations with a lightweight decoder, achieving real-time rendering while maintaining high fidelity and scalability.
Findings
Achieves 14 FPS rendering on A100 GPU, matching LaCT quality.
Scales to 64 input views at 950x540 resolution with strong generalization.
Delivers superior depth prediction on ScanNetv2 compared to Gaussian-based methods.
Abstract
Recent advances in generalizable Gaussian splatting (GS) have enabled feed-forward reconstruction of scenes from tens of input views. Long-LRM notably scales this paradigm to 32 input images at resolution, achieving 360{\deg} scene-level reconstruction in a single forward pass. However, directly predicting millions of Gaussian parameters at once remains highly error-sensitive: small inaccuracies in positions or other attributes lead to noticeable blurring, particularly in fine structures such as text. In parallel, implicit representation methods such as LVSM and LaCT have demonstrated significantly higher rendering fidelity by compressing scene information into model weights rather than explicit Gaussians, and decoding RGB frames using the full transformer or TTT backbone. However, this computationally intensive decompression process for every rendered frame makes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
