Disambiguating Monocular Reconstruction of 3D Clothed Human with   Spatial-Temporal Transformer

Yong Deng; Baoxing Li; Xu Zhao

arXiv:2410.16337·cs.CV·October 23, 2024

Disambiguating Monocular Reconstruction of 3D Clothed Human with Spatial-Temporal Transformer

Yong Deng, Baoxing Li, Xu Zhao

PDF

Open Access

TL;DR

This paper introduces a Spatial-Temporal Transformer network that improves monocular 3D clothed human reconstruction by capturing global and temporal information, addressing ambiguities caused by viewpoint and lighting conditions.

Contribution

The proposed STT network integrates spatial and temporal transformers to enhance normal map prediction and feature consistency, advancing monocular 3D human reconstruction techniques.

Findings

01

Outperforms state-of-the-art methods on Adobe and MonoPerfCap datasets.

02

Maintains robust generalization under low-light outdoor conditions.

03

Effectively reduces ambiguity in back details and local image variations.

Abstract

Reconstructing 3D clothed humans from monocular camera data is highly challenging due to viewpoint limitations and image ambiguity. While implicit function-based approaches, combined with prior knowledge from parametric models, have made significant progress, there are still two notable problems. Firstly, the back details of human models are ambiguous due to viewpoint invisibility. The quality of the back details depends on the back normal map predicted by a convolutional neural network (CNN). However, the CNN lacks global information awareness for comprehending the back texture, resulting in excessively smooth back details. Secondly, a single image suffers from local ambiguity due to lighting conditions and body movement. However, implicit functions are highly sensitive to pixel variations in ambiguous regions. To address these ambiguities, we propose the Spatial-Temporal Transformer…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · Textile materials and evaluations · Industrial Vision Systems and Defect Detection

MethodsAttention Is All You Need · Linear Layer · Layer Normalization · Residual Connection · Position-Wise Feed-Forward Layer · Dense Connections · Softmax · Multi-Head Attention · Adam · Dropout