EgoGaussian: Dynamic Scene Understanding from Egocentric Video with 3D   Gaussian Splatting

Daiwei Zhang; Gengyan Li; Jiajie Li; Micka\"el Bressieux; Otmar; Hilliges; Marc Pollefeys; Luc Van Gool; Xi Wang

arXiv:2406.19811·cs.CV·October 3, 2024

EgoGaussian: Dynamic Scene Understanding from Egocentric Video with 3D Gaussian Splatting

Daiwei Zhang, Gengyan Li, Jiajie Li, Micka\"el Bressieux, Otmar, Hilliges, Marc Pollefeys, Luc Van Gool, Xi Wang

PDF

Open Access

TL;DR

EgoGaussian is a novel method that reconstructs 3D scenes and tracks object motion from egocentric video, capturing dynamic interactions without additional sensors, advancing scene understanding in complex human activities.

Contribution

It introduces the first approach to simultaneously reconstruct 3D scenes and track object motion from only RGB egocentric video, using Gaussian Splatting and online learning.

Findings

01

Outperforms state-of-the-art in dynamic scene reconstruction quality

02

Effectively segments and tracks object motion in egocentric videos

03

Qualitative results show high-fidelity 3D reconstructions

Abstract

Human activities are inherently complex, often involving numerous object interactions. To better understand these activities, it is crucial to model their interactions with the environment captured through dynamic changes. The recent availability of affordable head-mounted cameras and egocentric data offers a more accessible and efficient means to understand human-object interactions in 3D environments. However, most existing methods for human activity modeling neglect the dynamic interactions with objects, resulting in only static representations. The few existing solutions often require inputs from multiple sources, including multi-camera setups, depth-sensing cameras, or kinesthetic sensors. To this end, we introduce EgoGaussian, the first method capable of simultaneously reconstructing 3D scenes and dynamically tracking 3D object motion from RGB egocentric input alone. We leverage…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Human Pose and Action Recognition · Video Surveillance and Tracking Methods

MethodsFocus