On Negative Sampling for Contrastive Audio-Text Retrieval
Huang Xie, Okko R\"as\"anen, Tuomas Virtanen

TL;DR
This paper examines negative sampling strategies in contrastive audio-text retrieval, demonstrating that semi-hard negatives improve performance and highlighting issues like feature collapse with hard negatives.
Contribution
It introduces and evaluates various negative sampling strategies, especially semi-hard sampling with cross-modality scores, for contrastive audio-text retrieval.
Findings
Semi-hard negatives with cross-modality scores improve retrieval performance.
Performance varies significantly among different negative sampling strategies.
Hard negative sampling can cause feature collapse in the model.
Abstract
This paper investigates negative sampling for contrastive learning in the context of audio-text retrieval. The strategy for negative sampling refers to selecting negatives (either audio clips or textual descriptions) from a pool of candidates for a positive audio-text pair. We explore sampling strategies via model-estimated within-modality and cross-modality relevance scores for audio and text samples. With a constant training setting on the retrieval system from [1], we study eight sampling strategies, including hard and semi-hard negative sampling. Experimental results show that retrieval performance varies dramatically among different strategies. Particularly, by selecting semi-hard negatives with cross-modality scores, the retrieval system gains improved performance in both text-to-audio and audio-to-text retrieval. Besides, we show that feature collapse occurs while sampling hard…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis
MethodsContrastive Learning
