TL;DR
This paper introduces SFGS, a novel method for reconstructing detailed, expressive 3D human avatars from monocular videos, capturing fine details like hands and facial expressions with a single training stage.
Contribution
The proposed SFGS method combines spatial and temporal features with a structure-aware Gaussian module and residual hand refinement, advancing detailed avatar reconstruction from monocular videos.
Findings
Outperforms state-of-the-art methods in quantitative metrics.
Generates high-fidelity avatars with natural motion and fine details.
Requires only a single-stage training process.
Abstract
Reconstructing photorealistic and topology-aware human avatars from monocular videos remains a significant challenge in the fields of computer vision and graphics. While existing 3D human avatar modeling approaches can effectively capture body motion, they often fail to accurately model fine details such as hand movements and facial expressions. To address this, we propose Structure-aware Fine-grained Gaussian Splatting (SFGS), a novel method for reconstructing expressive and coherent full-body 3D human avatars from a monocular video sequence. The SFGS use both spatial-only triplane and time-aware hexplane to capture dynamic features across consecutive frames. A structure-aware gaussian module is designed to capture pose-dependent details in a spatially coherent manner and improve pose and texture expression. To better model hand deformations, we also propose a residual refinement…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
