Knowledge-Refined Dual Context-Aware Network for Partially Relevant Video Retrieval
Junkai Yang, Qirui Wang, Yaoqing Jin, Shuai Ma, Minghan Xu, Shanmin Pang

TL;DR
This paper introduces KDC-Net, a novel network that improves partially relevant video retrieval by refining semantic understanding and temporal focus through dual context-aware modules and knowledge distillation.
Contribution
The paper proposes a new dual context-aware network with hierarchical semantic aggregation and dynamic temporal attention, along with a CLIP-based distillation strategy, advancing video retrieval accuracy.
Findings
Outperforms state-of-the-art methods on PRVR benchmarks.
Excels particularly in low moment-to-video ratio scenarios.
Demonstrates effective semantic and temporal focus in video retrieval.
Abstract
Retrieving partially relevant segments from untrimmed videos remains difficult due to two persistent challenges: the mismatch in information density between text and video segments, and limited attention mechanisms that overlook semantic focus and event correlations. We present KDC-Net, a Knowledge-Refined Dual Context-Aware Network that tackles these issues from both textual and visual perspectives. On the text side, a Hierarchical Semantic Aggregation module captures and adaptively fuses multi-scale phrase cues to enrich query semantics. On the video side, a Dynamic Temporal Attention mechanism employs relative positional encoding and adaptive temporal windows to highlight key events with local temporal coherence. Additionally, a dynamic CLIP-based distillation strategy, enhanced with temporal-continuity-aware refinement, ensures segment-aware and objective-aligned knowledge transfer.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Video Analysis and Summarization
