Learning video retrieval models with relevance-aware online mining

Alex Falcon; Giuseppe Serra; Oswald Lanz

arXiv:2203.08688·cs.CV·March 17, 2022

Learning video retrieval models with relevance-aware online mining

Alex Falcon, Giuseppe Serra, Oswald Lanz

PDF

Open Access 2 Repos

TL;DR

This paper introduces Relevance-Aware Negatives and Positives mining (RANP), a novel method for improving cross-modal video retrieval by better selecting training samples based on semantic relevance, leading to state-of-the-art results.

Contribution

The paper proposes RANP, a new technique for selecting negatives and positives in training video-text retrieval models, addressing the issue of wrongly penalizing valid positives.

Findings

01

Achieves +5.3% nDCG on EPIC-Kitchens-100

02

Achieves +3.0% mAP on EPIC-Kitchens-100

03

Improves retrieval performance by better sample mining

Abstract

Due to the amount of videos and related captions uploaded every hour, deep learning-based solutions for cross-modal video retrieval are attracting more and more attention. A typical approach consists in learning a joint text-video embedding space, where the similarity of a video and its associated caption is maximized, whereas a lower similarity is enforced with all the other captions, called negatives. This approach assumes that only the video and caption pairs in the dataset are valid, but different captions - positives - may also describe its visual contents, hence some of them may be wrongly penalized. To address this shortcoming, we propose the Relevance-Aware Negatives and Positives mining (RANP) which, based on the semantics of the negatives, improves their selection while also increasing the similarity of other valid positives. We explore the influence of these techniques on two…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Video Analysis and Summarization · Advanced Image and Video Retrieval Techniques