LSRM: High-Fidelity Object-Centric Reconstruction via Scaled Context Windows

Zhengqin Li; Cheng Zhang; Jakob Engel; Zhao Dong

arXiv:2604.05182·cs.CV·April 8, 2026

LSRM: High-Fidelity Object-Centric Reconstruction via Scaled Context Windows

Zhengqin Li, Cheng Zhang, Jakob Engel, Zhao Dong

PDF

TL;DR

The paper presents LSRM, a scalable transformer-based model that significantly improves high-fidelity 3D object reconstruction and inverse rendering by expanding context windows and employing efficient sparse attention mechanisms.

Contribution

Introducing a novel scalable architecture with sparse attention, a coarse-to-fine pipeline, and geometric-aware routing to enhance 3D reconstruction and inverse rendering quality.

Findings

01

Handles 20x more object tokens than prior methods.

02

Achieves 2.5 dB higher PSNR in novel-view synthesis.

03

Reduces LPIPS by 40% compared to state-of-the-art.

Abstract

We introduce the Large Sparse Reconstruction Model to study how scaling transformer context windows impacts feed-forward 3D reconstruction. Although recent object-centric feed-forward methods deliver robust, high-quality reconstruction, they still lag behind dense-view optimization in recovering fine-grained texture and appearance. We show that expanding the context window -- by substantially increasing the number of active object and image tokens -- remarkably narrows this gap and enables high-fidelity 3D object reconstruction and inverse rendering. To scale effectively, we adapt native sparse attention in our architecture design, unlocking its capacity for 3D reconstruction with three key contributions: (1) an efficient coarse-to-fine pipeline that focuses computation on informative regions by predicting sparse high-resolution residuals; (2) a 3D-aware spatial routing mechanism that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.