MAC: Mining Activity Concepts for Language-based Temporal Localization

Runzhou Ge; Jiyang Gao; Kan Chen; Ram Nevatia

arXiv:1811.08925·cs.CV·November 26, 2018·20 cites

MAC: Mining Activity Concepts for Language-based Temporal Localization

Runzhou Ge, Jiyang Gao, Kan Chen, Ram Nevatia

PDF

Open Access 3 Repos

TL;DR

This paper introduces ACL, a novel method for language-based temporal localization in videos that mines semantic activity concepts from both video and language data, significantly improving accuracy over previous methods.

Contribution

The paper proposes ACL, which encodes semantic activity concepts from verb-obj pairs and visual classifiers, enhancing localization performance in untrimmed videos.

Findings

01

ACL outperforms state-of-the-art methods by over 5% on Charades-STA and TACoS datasets.

02

The method effectively leverages semantic cues from language and visual modalities.

03

ACL demonstrates robust localization capabilities with regression of sliding windows.

Abstract

We address the problem of language-based temporal localization in untrimmed videos. Compared to temporal localization with fixed categories, this problem is more challenging as the language-based queries not only have no pre-defined activity list but also may contain complex descriptions. Previous methods address the problem by considering features from video sliding windows and language queries and learning a subspace to encode their correlation, which ignore rich semantic cues about activities in videos and queries. We propose to mine activity concepts from both video and language modalities by applying the actionness score enhanced Activity Concepts based Localizer (ACL). Specifically, the novel ACL encodes the semantic concepts from verb-obj pairs in language queries and leverages activity classifiers' prediction scores to encode visual concepts. Besides, ACL also has the capability…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Video Analysis and Summarization