Ambiguity-Restrained Text-Video Representation Learning for Partially Relevant Video Retrieval
CH Cho, WJ Moon, W Jun, MS Jung, JP Heo

TL;DR
This paper introduces ARL, a novel framework for partially relevant video retrieval that explicitly models and leverages ambiguity in text-video pairs, improving retrieval accuracy by hierarchical and fine-grained learning.
Contribution
The paper proposes Ambiguity-Restrained representation Learning (ARL), incorporating ambiguity detection and multi-level semantic modeling for better PRVR performance.
Findings
ARL effectively detects ambiguous pairs using uncertainty and similarity criteria.
Hierarchical learning improves semantic understanding of ambiguous text-video pairs.
Fine-grained frame-level modeling enhances retrieval accuracy in untrimmed videos.
Abstract
Partially Relevant Video Retrieval~(PRVR) aims to retrieve a video where a specific segment is relevant to a given text query. Typical training processes of PRVR assume a one-to-one relationship where each text query is relevant to only one video. However, we point out the inherent ambiguity between text and video content based on their conceptual scope and propose a framework that incorporates this ambiguity into the model learning process. Specifically, we propose Ambiguity-Restrained representation Learning~(ARL) to address ambiguous text-video pairs. Initially, ARL detects ambiguous pairs based on two criteria: uncertainty and similarity. Uncertainty represents whether instances include commonly shared context across the dataset, while similarity indicates pair-wise semantic overlap. Then, with the detected ambiguous pairs, our ARL hierarchically learns the semantic relationship via…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Video Analysis and Summarization · Advanced Image and Video Retrieval Techniques
