Rethinking the State Update Gate for Long-Sequence Recurrent 3D Reconstruction
Kejun Ren, Lei Jin, Tianxin Huang, Lianming Xu, and Li Wang

TL;DR
This paper introduces a novel frame-level gating mechanism for long-sequence 3D reconstruction that improves accuracy and maintains constant memory without additional training or parameters.
Contribution
It proposes a parameter-free, closed-form frame-level gate derived from feature changes, addressing the structural bottleneck in recurrent 3D reconstruction.
Findings
Reduces long-sequence drift by 51% on TUM-RGBD
Decreases depth estimation error by 12.8% on Bonn video depth
Outperforms existing methods on KITTI long-sequence pose estimation
Abstract
Streaming 3D reconstruction under a strict constant-memory budget hinges on how the recurrent state is updated as the stream evolves. We profile TTT3R-style per-token gates across five benchmarks and discover a structural bottleneck: the gate is intrinsically bounded in magnitude (median ; never exceeding ) and nearly frame-invariant, yielding an effective memory horizon of only 3 frames per state token, which serves as the structural origin of long-sequence drift. We trace this to a missing axis: existing inference-time methods modulate updates only at the per-token, intra-frame level, while the orthogonal frame-level question of \emph{how strongly each frame should contribute to the state} has been treated as content-independent. We close this gap with a scalar frame-level gate derived in closed form from frame-to-frame changes of internal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
