# DevilSight: Augmenting Monocular Human Avatar Reconstruction through a Virtual Perspective

**Authors:** Yushuo Chen, Ruizhi Shao, Youxin Pang, Hongwen Zhang, Xinyi Wu, Rihui Wu, Yebin Liu

arXiv: 2509.00403 · 2025-09-03

## TL;DR

DevilSight introduces a framework that enhances monocular human avatar reconstruction by leveraging a generative model for multi-view supervision, resulting in more detailed and plausible 3D avatars from single videos.

## Contribution

The paper proposes using Human4DiT for multi-view supervision and introduces strategies for consistent motion and high-resolution detail enhancement in avatar reconstruction.

## Key findings

- Outperforms recent state-of-the-art methods
- Enriches details in unseen regions of avatars
- Improves regularization and artifact mitigation

## Abstract

We present a novel framework to reconstruct human avatars from monocular videos. Recent approaches have struggled either to capture the fine-grained dynamic details from the input or to generate plausible details at novel viewpoints, which mainly stem from the limited representational capacity of the avatar model and insufficient observational data. To overcome these challenges, we propose to leverage the advanced video generative model, Human4DiT, to generate the human motions from alternative perspective as an additional supervision signal. This approach not only enriches the details in previously unseen regions but also effectively regularizes the avatar representation to mitigate artifacts. Furthermore, we introduce two complementary strategies to enhance video generation: To ensure consistent reproduction of human motion, we inject the physical identity into the model through video fine-tuning. For higher-resolution outputs with finer details, a patch-based denoising algorithm is employed. Experimental results demonstrate that our method outperforms recent state-of-the-art approaches and validate the effectiveness of our proposed strategies.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2509.00403/full.md

## Figures

11 figures with captions in the complete paper: https://tomesphere.com/paper/2509.00403/full.md

## References

93 references — full list in the complete paper: https://tomesphere.com/paper/2509.00403/full.md

---
Source: https://tomesphere.com/paper/2509.00403