Speech-dependent Data Augmentation for Own Voice Reconstruction with Hearable Microphones in Noisy Environments
Mattes Ohlenbusch, Christian Rollwage, Simon Doclo

TL;DR
This paper introduces speech-dependent data augmentation methods for training own voice reconstruction systems in noisy environments, improving performance by simulating additional voice signals based on transfer characteristics.
Contribution
The paper presents novel speech-dependent augmentation techniques that estimate transfer functions from limited data to generate more training samples for voice reconstruction.
Findings
Speech-dependent augmentation outperforms other methods.
Fine-tuning further enhances reconstruction quality.
Transfer characteristics enable realistic voice simulation.
Abstract
Own voice pickup for hearables in noisy environments benefits from using both an outer and an in-ear microphone outside and inside the occluded ear. Due to environmental noise recorded at both microphones, and amplification of the own voice at low frequencies and band-limitation at the in-ear microphone, an own voice reconstruction system is needed to enable communication. A large amount of own voice signals is required to train a supervised deep learning-based own voice reconstruction system. Training data can either be obtained by recording a large amount of own voice signals of different talkers with a specific device, which is costly, or through augmentation of available speech data. Own voice signals can be simulated by assuming a linear time-invariant relative transfer function between hearable microphones for each phoneme, referred to as own voice transfer characteristics. In…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
