Length Matters: Length-Aware Transformer for Temporal Sentence Grounding
Yifan Wang, Ziyi Liu, Xiaolong Sun, Jiawei Wang, Hongmin Liu

TL;DR
This paper introduces a length-aware transformer model for temporal sentence grounding, improving role specialization of queries by leveraging length priors, leading to state-of-the-art results on benchmark datasets.
Contribution
The paper proposes a novel length-aware transformer that assigns queries to handle segments of specific lengths, enhancing prediction accuracy in TSG.
Findings
Achieves state-of-the-art performance on three benchmarks.
Length priors improve query specialization and prediction accuracy.
Ablation studies confirm the effectiveness of length-aware design.
Abstract
Temporal sentence grounding (TSG) is a highly challenging task aiming to localize the temporal segment within an untrimmed video corresponding to a given natural language description. Benefiting from the design of learnable queries, the DETR-based models have achieved substantial advancements in the TSG task. However, the absence of explicit supervision often causes the learned queries to overlap in roles, leading to redundant predictions. Therefore, we propose to improve TSG by making each query fulfill its designated role, leveraging the length priors of the video-description pairs. In this paper, we introduce the Length-Aware Transformer (LATR) for TSG, which assigns different queries to handle predictions based on varying temporal lengths. Specifically, we divide all queries into three groups, responsible for segments with short, middle, and long temporal durations, respectively.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
