EA-VTR: Event-Aware Video-Text Retrieval
Zongyang Ma, Ziqi Zhang, Yuxin Chen, Zhongang Qi, Chunfeng Yuan, Bing, Li, Yingmin Luo, Xu Li, Xiaojuan Qi, Ying Shan, Weiming Hu

TL;DR
EA-VTR enhances video-text retrieval by incorporating event-aware data augmentation and a novel model that captures detailed event content and temporal logic, leading to superior performance across multiple tasks.
Contribution
The paper introduces a new event augmentation strategy and an event-aware model, improving detailed event understanding and temporal alignment in video-text retrieval.
Findings
Outperforms existing methods on multiple datasets
Achieves superior event content perception
Demonstrates strong temporal logic understanding
Abstract
Understanding the content of events occurring in the video and their inherent temporal logic is crucial for video-text retrieval. However, web-crawled pre-training datasets often lack sufficient event information, and the widely adopted video-level cross-modal contrastive learning also struggles to capture detailed and complex video-text event alignment. To address these challenges, we make improvements from both data and model perspectives. In terms of pre-training data, we focus on supplementing the missing specific event content and event temporal transitions with the proposed event augmentation strategies. Based on the event-augmented data, we construct a novel Event-Aware Video-Text Retrieval model, ie, EA-VTR, which achieves powerful video-text retrieval ability through superior video event awareness. EA-VTR can efficiently encode frame-level and video-level visual representations…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques
MethodsFocus · Contrastive Learning
