Multi-Pair Temporal Sentence Grounding via Multi-Thread Knowledge Transfer Network
Xiang Fang, Wanlong Fang, Changshuo Wang, Daizong Liu, Keke Tang,, Jianfeng Dong, Pan Zhou, Beibei Li

TL;DR
This paper introduces a multi-pair temporal sentence grounding framework that co-trains multiple video-query pairs, leveraging shared semantics and prototypes to improve efficiency and accuracy in locating relevant video segments.
Contribution
It proposes a novel multi-pair co-training approach with cross-modal contrast, prototype alignment, and adaptive negative selection for more effective temporal sentence grounding.
Findings
Outperforms existing methods in accuracy and efficiency
Effectively models cross-modal semantic relationships
Reduces redundant knowledge re-obtaining
Abstract
Given some video-query pairs with untrimmed videos and sentence queries, temporal sentence grounding (TSG) aims to locate query-relevant segments in these videos. Although previous respectable TSG methods have achieved remarkable success, they train each video-query pair separately and ignore the relationship between different pairs. We observe that the similar video/query content not only helps the TSG model better understand and generalize the cross-modal representation but also assists the model in locating some complex video-query pairs. Previous methods follow a single-thread framework that cannot co-train different pairs and usually spends much time re-obtaining redundant knowledge, limiting their real-world applications. To this end, in this paper, we pose a brand-new setting: Multi-Pair TSG, which aims to co-train these pairs. In particular, we propose a novel video-query…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
MethodsALIGN
