Mesh4D: 4D Mesh Reconstruction and Tracking from Monocular Video

Zeren Jiang; Chuanxia Zheng; Iro Laina; Diane Larlus; Andrea Vedaldi

arXiv:2601.05251·cs.CV·January 9, 2026

Mesh4D: 4D Mesh Reconstruction and Tracking from Monocular Video

Zeren Jiang, Chuanxia Zheng, Iro Laina, Diane Larlus, Andrea Vedaldi

PDF

Open Access

TL;DR

Mesh4D introduces a fast, monocular video-based 4D mesh reconstruction method that encodes entire animations into a compact latent space, enabling accurate shape and motion recovery without skeletal priors at inference.

Contribution

The paper presents a novel autoencoder with spatio-temporal attention and a latent diffusion model for efficient 4D mesh reconstruction from monocular videos, without needing skeletal information during inference.

Findings

01

Outperforms prior methods in 3D shape and deformation accuracy

02

Enables stable, single-pass animation prediction

03

Achieves high-quality novel view synthesis

Abstract

We propose Mesh4D, a feed-forward model for monocular 4D mesh reconstruction. Given a monocular video of a dynamic object, our model reconstructs the object's complete 3D shape and motion, represented as a deformation field. Our key contribution is a compact latent space that encodes the entire animation sequence in a single pass. This latent space is learned by an autoencoder that, during training, is guided by the skeletal structure of the training objects, providing strong priors on plausible deformations. Crucially, skeletal information is not required at inference time. The encoder employs spatio-temporal attention, yielding a more stable representation of the object's overall deformation. Building on this representation, we train a latent diffusion model that, conditioned on the input video and the mesh reconstructed from the first frame, predicts the full animation in one shot.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · Computer Graphics and Visualization Techniques · Human Motion and Animation