Detecting events and key actors in multi-person videos
Vignesh Ramanathan, Jonathan Huang, Sami Abu-El-Haija and, Alexander Gorban, Kevin Murphy, Li Fei-Fei

TL;DR
This paper introduces a novel attention-based model for multi-person video event detection that automatically identifies key actors without explicit annotations, demonstrating superior performance on a new basketball dataset.
Contribution
The paper presents an attention mechanism integrated with RNNs for detecting events and key actors in multi-person videos without requiring explicit annotations.
Findings
Outperforms state-of-the-art methods in event classification and detection
Successfully localizes relevant players using attention mechanism
Introduces a new basketball dataset with 14K event annotations
Abstract
Multi-person event recognition is a challenging task, often with many people active in the scene but only a small subset contributing to an actual event. In this paper, we propose a model which learns to detect events in such videos while automatically "attending" to the people responsible for the event. Our model does not use explicit annotations regarding who or where those people are during training and testing. In particular, we track people in videos and use a recurrent neural network (RNN) to represent the track features. We learn time-varying attention weights to combine these features at each time-instant. The attended features are then processed using another RNN for event detection/classification. Since most video datasets with multiple people are restricted to a small number of videos, we also collected a new basketball dataset comprising 257 basketball games with 14K event…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Detecting Events and Key Actors in Multi-Person Videos· youtube
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Video Analysis and Summarization
