TL;DR
PercHead introduces a novel perceptual loss based on DINOv2 and SAM 2.1, enabling high-quality, disentangled 3D head reconstruction and editing from a single image, with state-of-the-art results and robust editing capabilities.
Contribution
The paper proposes a new perceptual loss for 3D head modeling, leveraging Vision Transformers for decoupling 3D and 2D representations, and extends to interactive 3D editing with intuitive controls.
Findings
Achieves state-of-the-art performance in novel-view synthesis.
Exhibits robustness to extreme viewing angles.
Enables disentangled 3D editing with intuitive GUI.
Abstract
We present PercHead, a model for single-image 3D head reconstruction and disentangled 3D editing - two tasks that are inherently challenging due to ambiguity in plausible explanations for the same input. At the heart of our approach lies our novel perceptual loss based on DINOv2 and SAM 2.1. Unlike widely-adopted low-level losses like LPIPS, SSIM or L1, we rely on deep visual understanding of images and the resulting generalized supervision signals. We show that our new loss can be a drop-in replacement for standard losses and used to improve visual quality in high-frequency areas. We base our model architecture on Vision Transformers (ViTs), allowing us to decouple the 3D representation from the 2D input. We train our method on multi-view images for view-consistency and in-the-wild images for strong transferability to new environments. Our model achieves state-of-the-art performance in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
