TL;DR
This paper introduces a self-supervised method for learning group activity features by leveraging dynamics-aware and group-aware pretext tasks, improving group activity recognition without annotations.
Contribution
It proposes a novel approach combining person flow and object location estimation with DINO for group activity feature learning without annotations.
Findings
Achieves state-of-the-art performance in group activity retrieval and recognition.
Demonstrates effectiveness of local motion and scene context features.
Ablation studies confirm each component's contribution.
Abstract
This paper proposes Group Activity Feature (GAF) learning without group activity annotations. Unlike prior work, which uses low-level static local features to learn GAFs, we propose leveraging dynamics-aware and group-aware pretext tasks, along with local and global features provided by DINO, for group-dynamics-aware GAF learning. To adapt DINO and GAF learning to local dynamics and global group features, our pretext tasks use person flow estimation and group-relevant object location estimation, respectively. Person flow estimation is used to represent the local motion of each person, which is an important cue for understanding group activities. In contrast, group-relevant object location estimation encourages GAFs to learn scene context (e.g., spatial relations of people and objects) as global features. Comprehensive experiments on public datasets demonstrate the state-of-the-art…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
