Progressively Guide to Attend: An Iterative Alignment Framework for Temporal Sentence Grounding
Daizong Liu, Xiaoye Qu, Pan Zhou

TL;DR
This paper introduces IA-Net, an iterative alignment framework for temporal sentence grounding that progressively refines vision-language feature alignment through multiple reasoning steps, improving accuracy over existing single-step methods.
Contribution
The paper proposes an iterative alignment network with multi-step reasoning, feature padding, and calibration modules to enhance vision-language alignment in TSG.
Findings
Outperforms state-of-the-art on three benchmarks.
Effectively captures fine-grained cross-modal relations.
Robustly refines temporal boundary predictions.
Abstract
A key solution to temporal sentence grounding (TSG) exists in how to learn effective alignment between vision and language features extracted from an untrimmed video and a sentence description. Existing methods mainly leverage vanilla soft attention to perform the alignment in a single-step process. However, such single-step attention is insufficient in practice, since complicated relations between inter- and intra-modality are usually obtained through multi-step reasoning. In this paper, we propose an Iterative Alignment Network (IA-Net) for TSG task, which iteratively interacts inter- and intra-modal features within multiple steps for more accurate grounding. Specifically, during the iterative reasoning process, we pad multi-modal features with learnable parameters to alleviate the nowhere-to-attend problem of non-matched frame-word pairs, and enhance the basic co-attention mechanism…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Video Analysis and Summarization
