Link to the Past: Temporal Propagation for Fast 3D Human Reconstruction from Monocular Video
Matthew Marchellus, Nadhira Noor, and In Kyu Park

TL;DR
TemPoFast3D is a novel method for fast, high-quality 3D human reconstruction from monocular video that leverages temporal coherence to achieve real-time performance with minimal computational redundancy.
Contribution
It introduces a plug-and-play approach that transforms existing pixel-aligned networks to efficiently handle continuous video streams using temporal information.
Findings
Achieves up to 12 FPS in reconstruction speed.
Matches or exceeds state-of-the-art quality metrics.
Maintains high-quality textured reconstructions across diverse poses.
Abstract
Fast 3D clothed human reconstruction from monocular video remains a significant challenge in computer vision, particularly in balancing computational efficiency with reconstruction quality. Current approaches are either focused on static image reconstruction but too computationally intensive, or achieve high quality through per-video optimization that requires minutes to hours of processing, making them unsuitable for real-time applications. To this end, we present TemPoFast3D, a novel method that leverages temporal coherency of human appearance to reduce redundant computation while maintaining reconstruction quality. Our approach is a "plug-and play" solution that uniquely transforms pixel-aligned reconstruction networks to handle continuous video streams by maintaining and refining a canonical appearance representation through efficient coordinate mapping. Extensive experiments…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Generative Adversarial Networks and Image Synthesis · 3D Shape Modeling and Analysis
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
