EgoGaussian: Dynamic Scene Understanding from Egocentric Video with 3D Gaussian Splatting
Daiwei Zhang, Gengyan Li, Jiajie Li, Micka\"el Bressieux, Otmar, Hilliges, Marc Pollefeys, Luc Van Gool, Xi Wang

TL;DR
EgoGaussian is a novel method that reconstructs 3D scenes and tracks object motion from egocentric video, capturing dynamic interactions without additional sensors, advancing scene understanding in complex human activities.
Contribution
It introduces the first approach to simultaneously reconstruct 3D scenes and track object motion from only RGB egocentric video, using Gaussian Splatting and online learning.
Findings
Outperforms state-of-the-art in dynamic scene reconstruction quality
Effectively segments and tracks object motion in egocentric videos
Qualitative results show high-fidelity 3D reconstructions
Abstract
Human activities are inherently complex, often involving numerous object interactions. To better understand these activities, it is crucial to model their interactions with the environment captured through dynamic changes. The recent availability of affordable head-mounted cameras and egocentric data offers a more accessible and efficient means to understand human-object interactions in 3D environments. However, most existing methods for human activity modeling neglect the dynamic interactions with objects, resulting in only static representations. The few existing solutions often require inputs from multiple sources, including multi-camera setups, depth-sensing cameras, or kinesthetic sensors. To this end, we introduce EgoGaussian, the first method capable of simultaneously reconstructing 3D scenes and dynamically tracking 3D object motion from RGB egocentric input alone. We leverage…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Human Pose and Action Recognition · Video Surveillance and Tracking Methods
MethodsFocus
