Detector-Free Weakly Supervised Group Activity Recognition
Dongkeun Kim, Jinsung Lee, Minsu Cho, Suha Kwak

TL;DR
This paper introduces a transformer-based, detector-free approach for weakly supervised group activity recognition that does not require bounding box annotations, achieving state-of-the-art results on benchmark datasets.
Contribution
The proposed model uniquely leverages attention mechanisms to localize and encode partial group contexts without relying on object detectors or bounding box labels.
Findings
Outperforms existing weakly supervised methods on Volleyball and NBA datasets.
Surpasses some models with stronger supervision.
Demonstrates effective encoding of temporal evolution of group activities.
Abstract
Group activity recognition is the task of understanding the activity conducted by a group of people as a whole in a multi-person video. Existing models for this task are often impractical in that they demand ground-truth bounding box labels of actors even in testing or rely on off-the-shelf object detectors. Motivated by this, we propose a novel model for group activity recognition that depends neither on bounding box labels nor on object detector. Our model based on Transformer localizes and encodes partial contexts of a group activity by leveraging the attention mechanism, and represents a video clip as a set of partial context embeddings. The embedding vectors are then aggregated to form a single group representation that reflects the entire context of an activity while capturing temporal evolution of each partial context. Our method achieves outstanding performance on two…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Gait Recognition and Analysis
MethodsAttention Is All You Need · Linear Layer · Dropout · Layer Normalization · Label Smoothing · Softmax · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Dense Connections · Multi-Head Attention
