Throat and acoustic paired speech dataset for deep learning-based speech enhancement

Yunsik Kim; Yonghun Song; and Yoonyoung Chung

arXiv:2502.11478·cs.SD·April 23, 2026

Throat and acoustic paired speech dataset for deep learning-based speech enhancement

Yunsik Kim, Yonghun Song, and Yoonyoung Chung

PDF

TL;DR

This paper introduces the TAPS dataset, a paired throat and acoustic speech dataset for deep learning, along with an alignment method, to advance speech enhancement research in noisy environments.

Contribution

The paper presents the first standardized paired speech dataset with an alignment method, enabling improved deep learning-based speech enhancement from throat microphones.

Findings

01

Mapping-based models outperform others in speech quality restoration

02

TAPS dataset effectively supports deep learning research in noisy environments

03

Optimal alignment improves signal matching between throat and acoustic recordings

Abstract

In high-noise environments such as factories, subways, and busy streets, capturing clear speech is challenging. Throat microphones can offer a solution because of their inherent noise-suppression capabilities; however, the passage of sound waves through skin and tissue attenuates high-frequency information, reducing speech clarity. Recent deep learning approaches have shown promise in enhancing throat microphone recordings, but further progress is constrained by the lack of a standard dataset. Here, we introduce the Throat and Acoustic Paired Speech (TAPS) dataset, a collection of paired utterances recorded from 60 native Korean speakers using throat and acoustic microphones. Furthermore, an optimal alignment approach was developed and applied to address the inherent signal mismatch between the two microphones. We tested three baseline deep learning models on the TAPS dataset and found…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.