Towards noise-robust speech inversion through multi-task learning with speech enhancement

Saba Tabatabaee; Carol Espy-Wilson

arXiv:2601.14516·eess.AS·January 22, 2026

Towards noise-robust speech inversion through multi-task learning with speech enhancement

Saba Tabatabaee, Carol Espy-Wilson

PDF

Open Access

TL;DR

This paper introduces a joint framework combining speech enhancement and speech inversion using shared SSL-based representations, significantly improving robustness to background noise in real-world scenarios.

Contribution

It proposes a unified multi-task learning approach that enhances speech inversion performance under noisy conditions by integrating speech enhancement with SSL-based representations.

Findings

01

80.95% relative improvement under babble noise at -5 dB SNR

02

38.98% relative improvement under non-babble noise at -5 dB SNR

03

Joint training benefits both speech enhancement and inversion tasks

Abstract

Recent studies demonstrate the effectiveness of Self Supervised Learning (SSL) speech representations for Speech Inversion (SI). However, applying SI in real-world scenarios remains challenging due to the pervasive presence of background noise. We propose a unified framework that integrates Speech Enhancement (SE) and SI models through shared SSL-based speech representations. In this framework, the SSL model is trained not only to support the SE module in suppressing noise but also to produce representations that are more informative for the SI task, allowing both modules to benefit from joint training. At a Signal-to-Noise Ratio of -5 db, our method for the SI task achieves relative improvements over the baseline of 80.95% under babble noise and 38.98% under non-babble noise, as measured by the average Pearson product-moment correlation across all estimated parameters.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Hearing Loss and Rehabilitation