3D-Aware Instance Segmentation and Tracking in Egocentric Videos
Yash Bhalgat, Vadim Tschernezki, Iro Laina, Jo\~ao F. Henriques,, Andrea Vedaldi, Andrew Zisserman

TL;DR
This paper presents a 3D-aware method for instance segmentation and tracking in egocentric videos, significantly improving accuracy and consistency over existing 2D approaches by leveraging scene geometry and temporal cues.
Contribution
It introduces a novel 3D-aware framework that integrates scene geometry and object tracking for robust egocentric video analysis, outperforming state-of-the-art methods.
Findings
Outperforms previous methods by 7 points in Association Accuracy.
Reduces ID switches by 73-80%.
Enhances downstream 3D reconstruction and amodal segmentation.
Abstract
Egocentric videos present unique challenges for 3D scene understanding due to rapid camera motion, frequent object occlusions, and limited object visibility. This paper introduces a novel approach to instance segmentation and tracking in first-person video that leverages 3D awareness to overcome these obstacles. Our method integrates scene geometry, 3D object centroid tracking, and instance segmentation to create a robust framework for analyzing dynamic egocentric scenes. By incorporating spatial and temporal cues, we achieve superior performance compared to state-of-the-art 2D approaches. Extensive evaluations on the challenging EPIC Fields dataset demonstrate significant improvements across a range of tracking and segmentation consistency metrics. Specifically, our method outperforms the next best performing approach by points in Association Accuracy (AssA) and points in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Video Surveillance and Tracking Methods · Visual Attention and Saliency Detection
