Perceive and predict: self-supervised speech representation based loss   functions for speech enhancement

George Close; William Ravenscroft; Thomas Hain; Stefan Goetze

arXiv:2301.04388·cs.SD·June 27, 2023·1 cites

Perceive and predict: self-supervised speech representation based loss functions for speech enhancement

George Close, William Ravenscroft, Thomas Hain, Stefan Goetze

PDF

Open Access

TL;DR

This paper introduces a novel loss function based on the distance between self-supervised speech feature encodings, which correlates with speech quality and improves speech enhancement performance.

Contribution

It demonstrates that using the distance between self-supervised speech features as a loss function enhances speech enhancement models beyond traditional spectrogram-based losses.

Findings

01

Feature encoding distance correlates with speech quality and intelligibility.

02

Proposed loss improves PESQ and STOI scores over traditional methods.

03

Self-supervised representations are effective in speech enhancement tasks.

Abstract

Recent work in the domain of speech enhancement has explored the use of self-supervised speech representations to aid in the training of neural speech enhancement models. However, much of this work focuses on using the deepest or final outputs of self supervised speech representation models, rather than the earlier feature encodings. The use of self supervised representations in such a way is often not fully motivated. In this work it is shown that the distance between the feature encodings of clean and noisy speech correlate strongly with psychoacoustically motivated measures of speech quality and intelligibility, as well as with human Mean Opinion Score (MOS) ratings. Experiments using this distance as a loss function are performed and improved performance over the use of STFT spectrogram distance based loss as well as other common loss functions from speech enhancement literature is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Hearing Loss and Rehabilitation · Advanced Adaptive Filtering Techniques