Low-Complexity Own Voice Reconstruction for Hearables with an In-Ear Microphone
Mattes Ohlenbusch, Christian Rollwage, Simon Doclo

TL;DR
This paper introduces a low-complexity deep learning system for reconstructing a user's own voice in noisy environments using hearables, improving speech quality with limited device-specific data.
Contribution
It proposes low-complexity variants of an existing deep learning-based own voice reconstruction system tailored for hearables with limited resources.
Findings
Significant speech quality improvement demonstrated in simulations
Effective data augmentation reduces need for extensive device-specific recordings
Low-complexity models perform well under resource constraints
Abstract
Hearable devices, equipped with one or more microphones, are commonly used for speech communication. Here, we consider the scenario where a hearable is used to capture the user's own voice in a noisy environment. In this scenario, own voice reconstruction (OVR) is essential for enhancing the quality and intelligibility of the recorded noisy own voice signals. In previous work, we developed a deep learning-based OVR system, aiming to reduce the amount of device-specific recordings for training by using data augmentation with phoneme-dependent models of own voice transfer characteristics. Given the limited computational resources available on hearables, in this paper we propose low-complexity variants of an OVR system based on the FT-JNF architecture and investigate the required amount of device-specific recordings for effective data augmentation and fine-tuning. Simulation results show…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing
