Efficient Activity Detection in Untrimmed Video with Max-Subgraph Search

Chao-Yeh Chen; Kristen Grauman

arXiv:1607.02815·cs.CV·July 12, 2016

Efficient Activity Detection in Untrimmed Video with Max-Subgraph Search

Chao-Yeh Chen, Kristen Grauman

PDF

Open Access

TL;DR

This paper introduces a fast, graph-based method for activity detection in untrimmed videos, enabling more accurate localization by efficiently searching for the best activity instances in space-time.

Contribution

It unifies activity categorization and localization into a maximum-weight subgraph problem, allowing efficient detection of non-cubically shaped activity regions in videos.

Findings

01

Demonstrates improved speed over existing methods.

02

Achieves higher detection accuracy on four datasets.

03

Enables search over broader space-time regions.

Abstract

We propose an efficient approach for activity detection in video that unifies activity categorization with space-time localization. The main idea is to pose activity detection as a maximum-weight connected subgraph problem. Offline, we learn a binary classifier for an activity category using positive video exemplars that are "trimmed" in time to the activity of interest. Then, given a novel \emph{untrimmed} video sequence, we decompose it into a 3D array of space-time nodes, which are weighted based on the extent to which their component features support the learned activity model. To perform detection, we then directly localize instances of the activity by solving for the maximum-weight connected subgraph in the test video's space-time graph. We show that this detection strategy permits an efficient branch-and-cut solution for the best-scoring---and possibly non-cubically…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Multimodal Machine Learning Applications

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings