Learning to track for spatio-temporal action localization

Philippe Weinzaepfel; Zaid Harchaoui; Cordelia Schmid

arXiv:1506.01929·cs.CV·September 29, 2015·72 cites

Learning to track for spatio-temporal action localization

Philippe Weinzaepfel, Zaid Harchaoui, Cordelia Schmid

PDF

Open Access

TL;DR

This paper introduces a novel method for spatio-temporal action localization in videos, combining proposal detection, tracking, and scoring with CNN features and motion histograms, achieving state-of-the-art results.

Contribution

It presents a comprehensive approach integrating detection, tracking, and temporal localization for improved action localization performance.

Findings

01

Outperforms state-of-the-art on UCF-Sports, J-HMDB, UCF-101 datasets

02

Achieves up to 15% higher mAP in action localization

03

Effectively combines static, motion, and track-level features

Abstract

We propose an effective approach for spatio-temporal action localization in realistic videos. The approach first detects proposals at the frame-level and scores them with a combination of static and motion CNN features. It then tracks high-scoring proposals throughout the video using a tracking-by-detection approach. Our tracker relies simultaneously on instance-level and class-level detectors. The tracks are scored using a spatio-temporal motion histogram, a descriptor at the track level, in combination with the CNN features. Finally, we perform temporal localization of the action using a sliding-window approach at the track level. We present experimental results for spatio-temporal localization on the UCF-Sports, J-HMDB and UCF-101 action localization datasets, where our approach outperforms the state of the art with a margin of 15%, 7% and 12% respectively in mAP.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Video Analysis and Summarization · Human Motion and Animation