RF-GML: Reference-Free Generative Machine Listener

Arijit Biswas; Guanxin Jiang

arXiv:2409.10210·eess.AS·December 24, 2024

RF-GML: Reference-Free Generative Machine Listener

Arijit Biswas, Guanxin Jiang

PDF

Open Access

TL;DR

RF-GML is a new reference-free audio quality metric that accurately predicts subjective scores for various audio formats and codecs, leveraging transfer learning from a full-reference model.

Contribution

It introduces RF-GML, a novel transfer learning-based reference-free audio quality assessment model with high accuracy and versatility across diverse audio content and codecs.

Findings

01

Outperforms existing RF models in predicting subjective quality.

02

Effectively distinguishes different levels of coding artifacts.

03

Works well across mono, stereo, and binaural audio at 48 kHz.

Abstract

This paper introduces a novel reference-free (RF) audio quality metric called the RF-Generative Machine Listener (RF-GML), designed to evaluate coded mono, stereo, and binaural audio at a 48 kHz sample rate. RF-GML leverages transfer learning from a state-of-the-art full-reference (FR) Generative Machine Listener (GML) with minimal architectural modifications. The term "generative" refers to the model's ability to generate an arbitrary number of simulated listening scores. Unlike existing RF models, RF-GML accurately predicts subjective quality scores across diverse content types and codecs. Extensive evaluations demonstrate its superiority in rating unencoded audio and distinguishing different levels of coding artifacts. RF-GML's performance and versatility make it a valuable tool for coded audio quality assessment and monitoring in various applications, all without the need for a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing