RayMap3R: Inference-Time RayMap for Dynamic 3D Reconstruction
Feiran Wang, Zezhou Shang, Gaowen Liu, Yan Yan

TL;DR
RayMap3R is a training-free streaming framework for dynamic 3D scene reconstruction that leverages RayMap predictions to identify and suppress dynamic regions, improving real-time accuracy and stability.
Contribution
It introduces a novel dual-branch inference scheme and stabilization techniques for dynamic scene reconstruction without additional training.
Findings
Achieves state-of-the-art results on multiple benchmarks.
Effectively identifies dynamic regions using RayMap bias.
Improves stability and metric consistency in reconstructions.
Abstract
Streaming feed-forward 3D reconstruction enables real-time joint estimation of scene geometry and camera poses from RGB images. However, without explicit dynamic reasoning, streaming models can be affected by moving objects, causing artifacts and drift. In this work, we propose RayMap3R, a training-free streaming framework for dynamic scene reconstruction. We observe that RayMap-based predictions exhibit a static-scene bias, providing an internal cue for dynamic identification. Based on this observation, we construct a dual-branch inference scheme that identifies dynamic regions by contrasting RayMap and image predictions, suppressing their interference during memory updates. We further introduce reset metric alignment and state-aware smoothing to preserve metric consistency and stabilize predicted trajectories. Our method achieves state-of-the-art performance among streaming approaches…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · 3D Shape Modeling and Analysis · Human Pose and Action Recognition
