Background-Click Supervision for Temporal Action Localization

Le Yang; Junwei Han; Tao Zhao; Tianwei Lin; Dingwen Zhang; Jianxin; Chen

arXiv:2111.12449·cs.CV·November 25, 2021

Background-Click Supervision for Temporal Action Localization

Le Yang, Junwei Han, Tao Zhao, Tianwei Lin, Dingwen Zhang, Jianxin, Chen

PDF

Open Access 1 Repo

TL;DR

This paper introduces BackTAL, a novel background-click supervision method for weakly supervised temporal action localization, which improves performance by focusing on background frame labels and advanced modeling techniques.

Contribution

It proposes background-click supervision and two-fold modeling to enhance action localization accuracy over existing methods.

Findings

01

BackTAL outperforms previous weakly supervised methods on three benchmarks.

02

Background-click supervision effectively reduces background errors in localization.

03

The proposed modules improve the distinction between action and background frames.

Abstract

Weakly supervised temporal action localization aims at learning the instance-level action pattern from the video-level labels, where a significant challenge is action-context confusion. To overcome this challenge, one recent work builds an action-click supervision framework. It requires similar annotation costs but can steadily improve the localization performance when compared to the conventional weakly supervised methods. In this paper, by revealing that the performance bottleneck of the existing approaches mainly comes from the background errors, we find that a stronger action localizer can be trained with labels on the background video frames rather than those on the action frames. To this end, we convert the action-click supervision to the background-click supervision and develop a novel method, called BackTAL. Specifically, BackTAL implements two-fold modeling on the background…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

vividle/backtal
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Advanced Vision and Imaging · Multimodal Machine Learning Applications