Generating Humanless Environment Walkthroughs from Egocentric Walking Tour Videos
Yujin Ham, Junho Kim, Vivek Boominathan, Guha Balakrishnan

TL;DR
This paper introduces a generative inpainting method to remove humans from egocentric walking tour videos, enabling better environment modeling and 3D reconstruction.
Contribution
It creates a semi-synthetic dataset and fine-tunes a diffusion model to effectively remove humans and shadows from videos, improving environment visualization.
Findings
The model outperforms Casper in human removal quality.
Generated clips enable accurate 3D/4D urban environment modeling.
The dataset maintains high visual diversity from real egocentric videos.
Abstract
Egocentric "walking tour" videos provide a rich source of image data to develop rich and diverse visual models of environments around the world. However, the significant presence of humans in frames of these videos due to crowds and eye-level camera perspectives mitigates their usefulness in environment modeling applications. We focus on addressing this challenge by developing a generative algorithm that can realistically remove (i.e., inpaint) humans and their associated shadow effects from walking tour videos. Key to our approach is the construction of a rich semi-synthetic dataset of video clip pairs to train this generative model. Each pair in the dataset consists of an environment-only background clip, and a composite clip of walking humans with simulated shadows overlaid on the background. We randomly sourced both foreground and background components from real egocentric walking…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
