Enriching Local and Global Contexts for Temporal Action Localization

Zixin Zhu (Xi'an jiaotong University); Wei Tang (University of; Illinois at Chicago); Le Wang (Xi'an Jiaotong University); Nanning Zheng; (Xi'an Jiaotong University); Gang Hua (Wormpex AI Research)

arXiv:2107.12960·cs.CV·August 10, 2021

Enriching Local and Global Contexts for Temporal Action Localization

Zixin Zhu (Xi'an jiaotong University), Wei Tang (University of, Illinois at Chicago), Le Wang (Xi'an Jiaotong University), Nanning Zheng, (Xi'an Jiaotong University), Gang Hua (Wormpex AI Research)

PDF

1 Repo

TL;DR

This paper introduces ContextLoc, a model that enhances local and global contextual information for improved temporal action localization, achieving state-of-the-art results on benchmark datasets.

Contribution

The paper proposes a novel framework that enriches local and global contexts in a two-stage TAL model, including a context adaptation module and inter-proposal relation modeling.

Findings

01

Outperforms recent state-of-the-art on THUMOS14 and ActivityNet v1.3 datasets.

02

Enriching local and global contexts improves temporal localization accuracy.

03

The proposed context adaptation module effectively adapts global context to proposals.

Abstract

Effectively tackling the problem of temporal action localization (TAL) necessitates a visual representation that jointly pursues two confounding goals, i.e., fine-grained discrimination for temporal localization and sufficient visual invariance for action classification. We address this challenge by enriching both the local and global contexts in the popular two-stage temporal localization framework, where action proposals are first generated followed by action classification and temporal boundary regression. Our proposed model, dubbed ContextLoc, can be divided into three sub-networks: L-Net, G-Net and P-Net. L-Net enriches the local context via fine-grained modeling of snippet-level features, which is formulated as a query-and-retrieval process. G-Net enriches the global context via higher-level modeling of the video-level representation. In addition, we introduce a novel context…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

buxiangzhiren/contextloc
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.