A Reference-free Metric for Language-Queried Audio Source Separation   using Contrastive Language-Audio Pretraining

Feiyang Xiao; Jian Guan; Qiaoxi Zhu; Xubo Liu; Wenbo Wang; Shuhan Qi,; Kejia Zhang; Jianyuan Sun; and Wenwu Wang

arXiv:2407.04936·cs.SD·January 7, 2025

A Reference-free Metric for Language-Queried Audio Source Separation using Contrastive Language-Audio Pretraining

Feiyang Xiao, Jian Guan, Qiaoxi Zhu, Xubo Liu, Wenbo Wang, Shuhan Qi,, Kejia Zhang, Jianyuan Sun, and Wenwu Wang

PDF

Open Access 1 Repo

TL;DR

This paper proposes CLAPScore, a reference-free, semantic similarity-based evaluation metric for language-queried audio source separation that does not require reference signals and considers content relevance.

Contribution

The paper introduces CLAPScore, a novel reference-free evaluation metric using contrastive language-audio pretraining for assessing LASS systems based on semantic relevance.

Findings

01

CLAPScore correlates well with human judgment of audio relevance.

02

It outperforms traditional SDR metrics in content-based evaluation.

03

The metric is publicly available for research use.

Abstract

Language-queried audio source separation (LASS) aims to separate an audio source guided by a text query, with the signal-to-distortion ratio (SDR)-based metrics being commonly used to objectively measure the quality of the separated audio. However, the SDR-based metrics require a reference signal, which is often difficult to obtain in real-world scenarios. In addition, with the SDR-based metrics, the content information of the text query is not considered effectively in LASS. This paper introduces a reference-free evaluation metric using a contrastive language-audio pretraining (CLAP) module, termed CLAPScore, which measures the semantic similarity between the separated audio and the text query. Unlike SDR, the proposed CLAPScore metric evaluates the quality of the separated audio based on the content information of the text query, without needing a reference signal. Experiments show…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

littleflyingsheep/clapscore_for_lass
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis