Generic Speech Enhancement with Self-Supervised Representation Space Loss

Hiroshi Sato; Tsubasa Ochiai; Marc Delcroix; Takafumi Moriya; Takanori Ashihara; Ryo Masumura

arXiv:2507.07631·eess.AS·July 11, 2025

Generic Speech Enhancement with Self-Supervised Representation Space Loss

Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Takafumi Moriya, Takanori Ashihara, Ryo Masumura

PDF

TL;DR

This paper introduces a self-supervised representation space loss for speech enhancement, enabling a generic model that improves multiple downstream speech tasks without task-specific tuning.

Contribution

The study proposes a novel training criterion based on self-supervised feature space distance, enhancing generalization across various speech enhancement tasks.

Findings

01

Improves performance on multiple speech tasks

02

Maintains perceptual quality of enhanced speech

03

Enables task-agnostic speech enhancement models

Abstract

Single-channel speech enhancement is utilized in various tasks to mitigate the effect of interfering signals. Conventionally, to ensure the speech enhancement performs optimally, the speech enhancement has needed to be tuned for each task. Thus, generalizing speech enhancement models to unknown downstream tasks has been challenging. This study aims to construct a generic speech enhancement front-end that can improve the performance of back-ends to solve multiple downstream tasks. To this end, we propose a novel training criterion that minimizes the distance between the enhanced and the ground truth clean signal in the feature representation domain of self-supervised learning models. Since self-supervised learning feature representations effectively express high-level speech information useful for solving various downstream tasks, the proposal is expected to make speech enhancement…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.