Link to the Past: Temporal Propagation for Fast 3D Human Reconstruction from Monocular Video

Matthew Marchellus; Nadhira Noor; and In Kyu Park

arXiv:2505.07333·cs.CV·May 13, 2025

Link to the Past: Temporal Propagation for Fast 3D Human Reconstruction from Monocular Video

Matthew Marchellus, Nadhira Noor, and In Kyu Park

PDF

Open Access

TL;DR

TemPoFast3D is a novel method for fast, high-quality 3D human reconstruction from monocular video that leverages temporal coherence to achieve real-time performance with minimal computational redundancy.

Contribution

It introduces a plug-and-play approach that transforms existing pixel-aligned networks to efficiently handle continuous video streams using temporal information.

Findings

01

Achieves up to 12 FPS in reconstruction speed.

02

Matches or exceeds state-of-the-art quality metrics.

03

Maintains high-quality textured reconstructions across diverse poses.

Abstract

Fast 3D clothed human reconstruction from monocular video remains a significant challenge in computer vision, particularly in balancing computational efficiency with reconstruction quality. Current approaches are either focused on static image reconstruction but too computationally intensive, or achieve high quality through per-video optimization that requires minutes to hours of processing, making them unsuitable for real-time applications. To this end, we present TemPoFast3D, a novel method that leverages temporal coherency of human appearance to reduce redundant computation while maintaining reconstruction quality. Our approach is a "plug-and play" solution that uniquely transforms pixel-aligned reconstruction networks to handle continuous video streams by maintaining and refining a canonical appearance representation through efficient coordinate mapping. Extensive experiments…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Generative Adversarial Networks and Image Synthesis · 3D Shape Modeling and Analysis

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings