MoRe: Motion-aware Feed-forward 4D Reconstruction Transformer

Juntong Fang; Zequn Chen; Weiqi Zhang; Donglin Di; Xuancheng Zhang; Chengmin Yang; Yu-Shen Liu

arXiv:2603.05078·cs.CV·March 9, 2026

MoRe: Motion-aware Feed-forward 4D Reconstruction Transformer

Juntong Fang, Zequn Chen, Weiqi Zhang, Donglin Di, Xuancheng Zhang, Chengmin Yang, Yu-Shen Liu

PDF

Open Access

TL;DR

MoRe is a fast, attention-based 4D reconstruction network that disentangles dynamic motion from static structure in monocular videos, enabling real-time, high-quality dynamic scene reconstruction.

Contribution

It introduces a novel attention-forcing strategy and grouped causal attention for efficient, temporally coherent 4D scene reconstruction from monocular videos.

Findings

01

Achieves high-quality dynamic reconstructions

02

Operates efficiently in real-time

03

Outperforms existing methods on multiple benchmarks

Abstract

Reconstructing dynamic 4D scenes remains challenging due to the presence of moving objects that corrupt camera pose estimation. Existing optimization methods alleviate this issue with additional supervision, but they are mostly computationally expensive and impractical in real-time applications. To address these limitations, we propose MoRe, a feedforward 4D reconstruction network that efficiently recovers dynamic 3D scenes from monocular videos. Built upon a strong static reconstruction backbone, MoRe employs an attention-forcing strategy to disentangle dynamic motion from static structure. To further enhance robustness, we fine-tune the model on large-scale, diverse datasets encompassing both dynamic and static scenes. Moreover, our grouped causal attention captures temporal dependencies and adapts to varying token lengths across frames, ensuring temporally coherent geometry…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Human Pose and Action Recognition · Generative Adversarial Networks and Image Synthesis