Renmin University of China at TRECVID 2022: Improving Video Search by Feature Fusion and Negation Understanding
Xirong Li, Aozhu Chen, Ziyue Wang, Fan Hu, Kaibin Tian, Xinru Chen,, Chengbo Dong

TL;DR
This paper presents novel techniques for video search enhancement, including feature fusion and negation understanding, achieving top-tier results in TRECVID 2022 by combining diverse features and addressing negation cues effectively.
Contribution
Introduces Lightweight Attentional Feature Fusion and Bidirectional Negation Learning for improved video retrieval, leveraging diverse features and negation cues in a unified framework.
Findings
Achieved second place in TRECVID 2022 AVS with infAP of 0.262.
LAFF outperforms multi-head self-attention in feature fusion.
BNL effectively models negation cues in video search.
Abstract
We summarize our TRECVID 2022 Ad-hoc Video Search (AVS) experiments. Our solution is built with two new techniques, namely Lightweight Attentional Feature Fusion (LAFF) for combining diverse visual / textual features and Bidirectional Negation Learning (BNL) for addressing queries that contain negation cues. In particular, LAFF performs feature fusion at both early and late stages and at both text and video ends to exploit diverse (off-the-shelf) features. Compared to multi-head self attention, LAFF is much more compact yet more effective. Its attentional weights can also be used for selecting fewer features, with the retrieval performance mostly preserved. BNL trains a negation-aware video retrieval model by minimizing a bidirectionally constrained loss per triplet, where a triplet consists of a given training video, its original description and a partially negated description. For…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Cancer-related molecular mechanisms research · Domain Adaptation and Few-Shot Learning
MethodsBLIP: Bootstrapping Language-Image Pre-training · Contrastive Language-Image Pre-training
