Object-Shot Enhanced Grounding Network for Egocentric Video
Yisen Feng, Haoyu Zhang, Meng Liu, Weili Guan, Liqiang Nie

TL;DR
This paper introduces OSGNet, a novel egocentric video grounding model that leverages object information and shot movement analysis to improve modality alignment and achieve state-of-the-art results.
Contribution
The paper proposes a new method that incorporates object extraction and shot movement analysis to enhance egocentric video grounding performance.
Findings
Achieves state-of-the-art results on three datasets.
Effectively utilizes object information for better grounding.
Leverages shot movement features to model wearer attention.
Abstract
Egocentric video grounding is a crucial task for embodied intelligence applications, distinct from exocentric video moment localization. Existing methods primarily focus on the distributional differences between egocentric and exocentric videos but often neglect key characteristics of egocentric videos and the fine-grained information emphasized by question-type queries. To address these limitations, we propose OSGNet, an Object-Shot enhanced Grounding Network for egocentric video. Specifically, we extract object information from videos to enrich video representation, particularly for objects highlighted in the textual query but not directly captured in the video features. Additionally, we analyze the frequent shot movements inherent to egocentric videos, leveraging these features to extract the wearer's attention information, which enhances the model's ability to perform modality…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Generative Adversarial Networks and Image Synthesis
MethodsSoftmax · Attention Is All You Need · Focus
