Towards Diverse Temporal Grounding under Single Positive Labels

Hao Zhou; Chongyang Zhang; Yanjun Chen; Chuanping Hu

arXiv:2303.06545·cs.CV·March 14, 2023·1 cites

Towards Diverse Temporal Grounding under Single Positive Labels

Hao Zhou, Chongyang Zhang, Yanjun Chen, Chuanping Hu

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel framework for temporal grounding in videos that accounts for multiple moments described by a single query, using positive moment estimation and diverse regression to improve retrieval accuracy.

Contribution

It reformulates temporal grounding as a one-vs-many problem, proposing the DTG-SPL framework with modules for positive moment estimation and diverse moment regression.

Findings

01

Outperforms existing methods on Charades-STA and ActivityNet Captions datasets.

02

Effectively mines potential positive moments to handle multiple relevant video segments.

03

Achieves superior results in both single-label and multi-label evaluation metrics.

Abstract

Temporal grounding aims to retrieve moments of the described event within an untrimmed video by a language query. Typically, existing methods assume annotations are precise and unique, yet one query may describe multiple moments in many cases. Hence, simply taking it as a one-vs-one mapping task and striving to match single-label annotations will inevitably introduce false negatives during optimization. In this study, we reformulate this task as a one-vs-many optimization problem under the condition of single positive labels. The unlabeled moments are considered unobserved rather than negative, and we explore mining potential positive moments to assist in multiple moment retrieval. In this setting, we propose a novel Diverse Temporal Grounding framework, termed DTG-SPL, which mainly consists of a positive moment estimation (PME) module and a diverse moment regression (DMR) module. PME…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhouhaocv/dtg-spl
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Video Analysis and Summarization · Human Pose and Action Recognition