NOMAD: Unsupervised Learning of Perceptual Embeddings for Speech   Enhancement and Non-matching Reference Audio Quality Assessment

Alessandro Ragano; Jan Skoglund; Andrew Hines

arXiv:2309.16284·cs.SD·January 22, 2024

NOMAD: Unsupervised Learning of Perceptual Embeddings for Speech Enhancement and Non-matching Reference Audio Quality Assessment

Alessandro Ragano, Jan Skoglund, Andrew Hines

PDF

Open Access 1 Repo

TL;DR

NOMAD introduces an unsupervised, differentiable perceptual similarity metric for audio that effectively assesses quality and degradation without human labels, outperforming existing non-matching reference methods.

Contribution

The paper proposes NOMAD, a novel unsupervised deep embedding method guided by NSIM for perceptual audio similarity, applicable to quality assessment and speech enhancement.

Findings

01

Outperforms other non-matching reference methods in ranking degradation and quality assessment

02

Achieves competitive results with full-reference audio metrics

03

Demonstrates effectiveness in speech enhancement and synthesis tasks

Abstract

This paper presents NOMAD (Non-Matching Audio Distance), a differentiable perceptual similarity metric that measures the distance of a degraded signal against non-matching references. The proposed method is based on learning deep feature embeddings via a triplet loss guided by the Neurogram Similarity Index Measure (NSIM) to capture degradation intensity. During inference, the similarity score between any two audio samples is computed through Euclidean distance of their embeddings. NOMAD is fully unsupervised and can be used in general perceptual audio tasks for audio analysis e.g. quality assessment and generative tasks such as speech enhancement and speech synthesis. The proposed method is evaluated with 3 tasks. Ranking degradation intensity, predicting speech quality, and as a loss function for speech enhancement. Results indicate NOMAD outperforms other non-matching reference…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

alessandroragano/nomad
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Hearing Loss and Rehabilitation · Music and Audio Processing