Real-time speech enhancement in noise for throat microphone using neural audio codec as foundation model

Julien Hauret; Thomas Joubaud; \'Eric Bavu

arXiv:2508.02974·eess.AS·August 6, 2025

Real-time speech enhancement in noise for throat microphone using neural audio codec as foundation model

Julien Hauret, Thomas Joubaud, \'Eric Bavu

PDF

TL;DR

This paper demonstrates real-time speech enhancement for throat microphone recordings in noisy environments using a fine-tuned neural audio codec, improving audio quality and robustness in a practical demo setting.

Contribution

It introduces a novel pipeline combining throat microphone recordings with a fine-tuned neural audio codec for real-time speech enhancement in noisy conditions.

Findings

01

Superior performance compared to state-of-the-art models

02

Real-time inference with low latency

03

Effective noise attenuation in throat microphone recordings

Abstract

We present a real-time speech enhancement demo using speech captured with a throat microphone. This demo aims to showcase the complete pipeline, from recording to deep learning-based post-processing, for speech captured in noisy environments with a body-conducted microphone. The throat microphone records skin vibrations, which naturally attenuate external noise, but this robustness comes at the cost of reduced audio bandwidth. To address this challenge, we fine-tune Kyutai's Mimi--a neural audio codec supporting real-time inference--on Vibravox, a dataset containing paired air-conducted and throat microphone recordings. We compare this enhancement strategy against state-of-the-art models and demonstrate its superior performance. The inference runs in an interactive interface that allows users to toggle enhancement, visualize spectrograms, and monitor processing latency.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.