Multi-granularity Generator for Temporal Action Proposal

Yuan Liu; Lin Ma; Yifeng Zhang; Wei Liu; Shih-Fu Chang

arXiv:1811.11524·cs.CV·April 15, 2019·19 cites

Multi-granularity Generator for Temporal Action Proposal

Yuan Liu, Lin Ma, Yifeng Zhang, Wei Liu, Shih-Fu Chang

PDF

Open Access

TL;DR

This paper introduces a multi-granularity generator that combines coarse segment proposals and fine frame actionness evaluation to improve temporal action proposal accuracy in untrimmed videos, achieving state-of-the-art results.

Contribution

The paper presents a novel multi-granularity generator (MGG) that integrates segment proposal and frame actionness components for better temporal action localization.

Findings

01

Outperforms state-of-the-art on THUMOS-14 and ActivityNet-1.3 datasets.

02

End-to-end trainable model with superior proposal quality.

03

Improves video detection accuracy with proposal classification.

Abstract

Temporal action proposal generation is an important task, aiming to localize the video segments containing human actions in an untrimmed video. In this paper, we propose a multi-granularity generator (MGG) to perform the temporal action proposal from different granularity perspectives, relying on the video visual features equipped with the position embedding information. First, we propose to use a bilinear matching model to exploit the rich local information within the video sequence. Afterwards, two components, namely segment proposal producer (SPP) and frame actionness producer (FAP), are combined to perform the task of temporal action proposal at two distinct granularities. SPP considers the whole video in the form of feature pyramid and generates segment proposals from one coarse perspective, while FAP carries out a finer actionness evaluation for each video frame. Our proposed MGG…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Video Analysis and Summarization