TL;DR
Interaction-GCN introduces a novel graph convolutional network framework that effectively models social interactions in egocentric videos by combining relational cues, temporal propagation, and motion information, achieving state-of-the-art results.
Contribution
The paper presents a new GCN-based framework for social interaction recognition in egocentric videos, integrating relational cues and temporal modeling.
Findings
Achieves state-of-the-art performance on two datasets.
Validates the effectiveness of relational and temporal modeling.
Demonstrates robustness through ablation studies.
Abstract
In this paper we propose a new framework to categorize social interactions in egocentric videos, we named InteractionGCN. Our method extracts patterns of relational and non-relational cues at the frame level and uses them to build a relational graph from which the interactional context at the frame level is estimated via a Graph Convolutional Network based approach. Then it propagates this context over time, together with first-person motion information, through a Gated Recurrent Unit architecture. Ablation studies and experimental evaluation on two publicly available datasets validate the proposed approach and establish state of the art results.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
