Detector-Free Weakly Supervised Group Activity Recognition

Dongkeun Kim; Jinsung Lee; Minsu Cho; Suha Kwak

arXiv:2204.02139·cs.CV·April 6, 2022

Detector-Free Weakly Supervised Group Activity Recognition

Dongkeun Kim, Jinsung Lee, Minsu Cho, Suha Kwak

PDF

Open Access

TL;DR

This paper introduces a transformer-based, detector-free approach for weakly supervised group activity recognition that does not require bounding box annotations, achieving state-of-the-art results on benchmark datasets.

Contribution

The proposed model uniquely leverages attention mechanisms to localize and encode partial group contexts without relying on object detectors or bounding box labels.

Findings

01

Outperforms existing weakly supervised methods on Volleyball and NBA datasets.

02

Surpasses some models with stronger supervision.

03

Demonstrates effective encoding of temporal evolution of group activities.

Abstract

Group activity recognition is the task of understanding the activity conducted by a group of people as a whole in a multi-person video. Existing models for this task are often impractical in that they demand ground-truth bounding box labels of actors even in testing or rely on off-the-shelf object detectors. Motivated by this, we propose a novel model for group activity recognition that depends neither on bounding box labels nor on object detector. Our model based on Transformer localizes and encodes partial contexts of a group activity by leveraging the attention mechanism, and represents a video clip as a set of partial context embeddings. The embedding vectors are then aggregated to form a single group representation that reflects the entire context of an activity while capturing temporal evolution of each partial context. Our method achieves outstanding performance on two…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Gait Recognition and Analysis

MethodsAttention Is All You Need · Linear Layer · Dropout · Layer Normalization · Label Smoothing · Softmax · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Dense Connections · Multi-Head Attention