EgoTracks: A Long-term Egocentric Visual Object Tracking Dataset
Hao Tang, Kevin Liang, Matt Feiszli, Weiyao Wang

TL;DR
EgoTracks is a new long-term egocentric visual object tracking dataset that highlights challenges like occlusions, rapid appearance changes, and re-detection, aiming to advance tracking models in real-world, embodied AI scenarios.
Contribution
The paper introduces EgoTracks, a comprehensive dataset for long-term egocentric tracking, and provides a baseline model EgoSTARK with improved performance on this challenging data.
Findings
State-of-the-art trackers perform poorly on EgoTracks
Enhancements to STARK significantly improve egocentric tracking
EgoTracks emphasizes long-term re-detection challenges
Abstract
Visual object tracking is a key component to many egocentric vision problems. However, the full spectrum of challenges of egocentric tracking faced by an embodied AI is underrepresented in many existing datasets; these tend to focus on relatively short, third-person videos. Egocentric video has several distinguishing characteristics from those commonly found in past datasets: frequent large camera motions and hand interactions with objects commonly lead to occlusions or objects exiting the frame, and object appearance can change rapidly due to widely different points of view, scale, or object states. Embodied tracking is also naturally long-term, and being able to consistently (re-)associate objects to their appearances and disappearances over as long as a lifetime is critical. Previous datasets under-emphasize this re-detection problem, and their "framed" nature has led to adoption of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Face recognition and analysis · Human Pose and Action Recognition
