Static Scene Reconstruction from Dynamic Egocentric Videos
Qifei Cui, Patrick Chen

TL;DR
This paper presents a robust method for static scene reconstruction from egocentric videos, addressing challenges like dynamic foreground interference and long-term drift, resulting in cleaner and more accurate 3D maps.
Contribution
It introduces a mask-aware reconstruction mechanism and chunked pose-graph stitching to improve static scene reconstruction in dynamic egocentric videos.
Findings
Significantly reduces trajectory error in egocentric videos.
Produces visually cleaner static geometry compared to baseline methods.
Effective on both HD-EPIC and indoor drone datasets.
Abstract
Egocentric videos present unique challenges for 3D reconstruction due to rapid camera motion and frequent dynamic interactions. State-of-the-art static reconstruction systems, such as MapAnything, often degrade in these settings, suffering from catastrophic trajectory drift and "ghost" geometry caused by moving hands. We bridge this gap by proposing a robust pipeline that adapts static reconstruction backbones to long-form egocentric video. Our approach introduces a mask-aware reconstruction mechanism that explicitly suppresses dynamic foreground in the attention layers, preventing hand artifacts from contaminating the static map. Furthermore, we employ a chunked reconstruction strategy with pose-graph stitching to ensure global consistency and eliminate long-term drift. Experiments on HD-EPIC and indoor drone datasets demonstrate that our pipeline significantly improves absolute…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Human Pose and Action Recognition · Robotics and Sensor-Based Localization
