Diversifying Query: Region-Guided Transformer for Temporal Sentence Grounding
Xiaolong Sun, Liushuai Shi, Le Wang, Sanping Zhou, Kun Xia, Yabing, Wang, Gang Hua

TL;DR
This paper introduces RGTR, a novel transformer-based model for temporal sentence grounding that uses explicit regional guidance to diversify moment queries, reducing redundancy and improving prediction accuracy.
Contribution
The paper proposes a region-guided approach with anchor pairs as explicit regional cues, replacing learnable queries to enhance diversity and reduce redundancy in temporal grounding.
Findings
RGTR outperforms state-of-the-art methods on multiple datasets.
Explicit regional guidance improves prediction diversity.
IoU-aware scoring enhances proposal quality.
Abstract
Temporal sentence grounding is a challenging task that aims to localize the moment spans relevant to a language description. Although recent DETR-based models have achieved notable progress by leveraging multiple learnable moment queries, they suffer from overlapped and redundant proposals, leading to inaccurate predictions. We attribute this limitation to the lack of task-related guidance for the learnable queries to serve a specific mode. Furthermore, the complex solution space generated by variable and open-vocabulary language descriptions complicates optimization, making it harder for learnable queries to distinguish each other adaptively. To tackle this limitation, we present a Region-Guided TRansformer (RGTR) for temporal sentence grounding, which diversifies moment queries to eliminate overlapped and redundant predictions. Instead of using learnable queries, RGTR adopts a set of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems
MethodsSparse Evolutionary Training
