Efficient Activity Detection in Untrimmed Video with Max-Subgraph Search
Chao-Yeh Chen, Kristen Grauman

TL;DR
This paper introduces a fast, graph-based method for activity detection in untrimmed videos, enabling more accurate localization by efficiently searching for the best activity instances in space-time.
Contribution
It unifies activity categorization and localization into a maximum-weight subgraph problem, allowing efficient detection of non-cubically shaped activity regions in videos.
Findings
Demonstrates improved speed over existing methods.
Achieves higher detection accuracy on four datasets.
Enables search over broader space-time regions.
Abstract
We propose an efficient approach for activity detection in video that unifies activity categorization with space-time localization. The main idea is to pose activity detection as a maximum-weight connected subgraph problem. Offline, we learn a binary classifier for an activity category using positive video exemplars that are "trimmed" in time to the activity of interest. Then, given a novel \emph{untrimmed} video sequence, we decompose it into a 3D array of space-time nodes, which are weighted based on the extent to which their component features support the learned activity model. To perform detection, we then directly localize instances of the activity by solving for the maximum-weight connected subgraph in the test video's space-time graph. We show that this detection strategy permits an efficient branch-and-cut solution for the best-scoring---and possibly non-cubically…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Multimodal Machine Learning Applications
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
