MUT3R: Motion-aware Updating Transformer for Dynamic 3D Reconstruction

Guole Shen; Tianchen Deng; Xingrui Qin; Nailin Wang; Jianyu Wang; Yanbo Wang; Yongtao Chen; Hesheng Wang; Jingchuan Wang

arXiv:2512.03939·cs.CV·December 4, 2025

MUT3R: Motion-aware Updating Transformer for Dynamic 3D Reconstruction

Guole Shen, Tianchen Deng, Xingrui Qin, Nailin Wang, Jianyu Wang, Yanbo Wang, Yongtao Chen, Hesheng Wang, Jingchuan Wang

PDF

Open Access

TL;DR

MUT3R is a training-free, motion-aware framework that improves dynamic 3D reconstruction by leveraging attention cues in pretrained transformers to suppress motion artifacts without retraining.

Contribution

The paper introduces MUT3R, a novel, training-free method that uses attention-derived motion cues to enhance dynamic 3D reconstruction stability and accuracy.

Findings

01

Improves temporal consistency in dynamic 3D reconstruction.

02

Enhances robustness to camera pose variations.

03

Does not require retraining or fine-tuning the transformer.

Abstract

Recent stateful recurrent neural networks have achieved remarkable progress on static 3D reconstruction but remain vulnerable to motion-induced artifacts, where non-rigid regions corrupt attention propagation between the spatial memory and image feature. By analyzing the internal behaviors of the state and image token updating mechanism, we find that aggregating self-attention maps across layers reveals a consistent pattern: dynamic regions are naturally down-weighted, exposing an implicit motion cue that the pretrained transformer already encodes but never explicitly uses. Motivated by this observation, we introduce MUT3R, a training-free framework that applies the attention-derived motion cue to suppress dynamic content in the early layers of the transformer during inference. Our attention-level gating module suppresses the influence of dynamic regions before their artifacts propagate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Human Pose and Action Recognition · Advanced Vision and Imaging