AHOY! Animatable Humans under Occlusion from YouTube Videos with Gaussian Splatting and Video Diffusion Priors

Aymen Mir; Riza Alp Guler; Xiangjun Tang; Peter Wonka; Gerard Pons-Moll

arXiv:2603.17975·cs.CV·March 19, 2026

AHOY! Animatable Humans under Occlusion from YouTube Videos with Gaussian Splatting and Video Diffusion Priors

Aymen Mir, Riza Alp Guler, Xiangjun Tang, Peter Wonka, Gerard Pons-Moll

PDF

Open Access

TL;DR

AHOY introduces a novel pipeline for reconstructing complete, animatable 3D Gaussian avatars from occluded monocular videos by leveraging diffusion models, canonical-to-pose architecture, and decoupled supervision, achieving state-of-the-art results.

Contribution

The paper presents a new method combining diffusion-based hallucination, a two-stage architecture, and decoupled supervision to handle occlusion in 3D avatar reconstruction from monocular videos.

Findings

01

State-of-the-art reconstruction quality on occluded YouTube videos

02

Robust avatar animation with novel poses

03

Effective handling of multi-view inconsistencies

Abstract

We present AHOY, a method for reconstructing complete, animatable 3D Gaussian avatars from in-the-wild monocular video despite heavy occlusion. Existing methods assume unoccluded input-a fully visible subject, often in a canonical pose-excluding the vast majority of real-world footage where people are routinely occluded by furniture, objects, or other people. Reconstructing from such footage poses fundamental challenges: large body regions may never be observed, and multi-view supervision per pose is unavailable. We address these challenges with four contributions: (i) a hallucination-as-supervision pipeline that uses identity-finetuned diffusion models to generate dense supervision for previously unobserved body regions; (ii) a two-stage canonical-to-pose-dependent architecture that bootstraps from sparse observations to full pose-dependent Gaussian maps; (iii) a map-pose/LBS-pose…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Generative Adversarial Networks and Image Synthesis · Human Pose and Action Recognition