Integrating Continuous and Binary Relevances in Audio-Text Relevance   Learning

Huang Xie; Khazar Khorrami; Okko R\"as\"anen; Tuomas Virtanen

arXiv:2408.14939·eess.AS·August 28, 2024

Integrating Continuous and Binary Relevances in Audio-Text Relevance Learning

Huang Xie, Khazar Khorrami, Okko R\"as\"anen, Tuomas Virtanen

PDF

Open Access

TL;DR

This paper proposes a novel audio-text relevance learning method that combines continuous human relevance ratings with binary labels, leading to improved retrieval performance and deeper analysis of relevance factors.

Contribution

It introduces a combined learning approach using listwise ranking and contrastive objectives to leverage both relevance types, enhancing audio-text relevance models.

Findings

01

Improved language-based audio retrieval accuracy

02

Effective integration of continuous and binary relevance data

03

Insights into factors influencing relevance ratings

Abstract

Audio-text relevance learning refers to learning the shared semantic properties of audio samples and textual descriptions. The standard approach uses binary relevances derived from pairs of audio samples and their human-provided captions, categorizing each pair as either positive or negative. This may result in suboptimal systems due to varying levels of relevance between audio samples and captions. In contrast, a recent study used human-assigned relevance ratings, i.e., continuous relevances, for these pairs but did not obtain performance gains in audio-text relevance learning. This work introduces a relevance learning method that utilizes both human-assigned continuous relevance ratings and binary relevances using a combination of a listwise ranking objective and a contrastive learning objective. Experimental results demonstrate the effectiveness of the proposed method, showing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing

MethodsContrastive Learning