Mesh4D: 4D Mesh Reconstruction and Tracking from Monocular Video
Zeren Jiang, Chuanxia Zheng, Iro Laina, Diane Larlus, Andrea Vedaldi

TL;DR
Mesh4D introduces a fast, monocular video-based 4D mesh reconstruction method that encodes entire animations into a compact latent space, enabling accurate shape and motion recovery without skeletal priors at inference.
Contribution
The paper presents a novel autoencoder with spatio-temporal attention and a latent diffusion model for efficient 4D mesh reconstruction from monocular videos, without needing skeletal information during inference.
Findings
Outperforms prior methods in 3D shape and deformation accuracy
Enables stable, single-pass animation prediction
Achieves high-quality novel view synthesis
Abstract
We propose Mesh4D, a feed-forward model for monocular 4D mesh reconstruction. Given a monocular video of a dynamic object, our model reconstructs the object's complete 3D shape and motion, represented as a deformation field. Our key contribution is a compact latent space that encodes the entire animation sequence in a single pass. This latent space is learned by an autoencoder that, during training, is guided by the skeletal structure of the training objects, providing strong priors on plausible deformations. Crucially, skeletal information is not required at inference time. The encoder employs spatio-temporal attention, yielding a more stable representation of the object's overall deformation. Building on this representation, we train a latent diffusion model that, conditioned on the input video and the mesh reconstructed from the first frame, predicts the full animation in one shot.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · Computer Graphics and Visualization Techniques · Human Motion and Animation
