TL;DR
This paper introduces SEANet, a multi-modal speech enhancement model that uses accelerometer data to improve speech quality in noisy environments, demonstrating high-quality results even with interfering speech.
Contribution
SEANet is the first wave-to-wave convolutional model that incorporates accelerometer data for speech enhancement, leveraging multi-modal inputs and adversarial training.
Findings
Achieves high-quality speech enhancement in noisy conditions.
Effectively suppresses interfering speech at similar loudness levels.
Utilizes accelerometer data as a strong conditioning signal.
Abstract
We explore the possibility of leveraging accelerometer data to perform speech enhancement in very noisy conditions. Although it is possible to only partially reconstruct user's speech from the accelerometer, the latter provides a strong conditioning signal that is not influenced from noise sources in the environment. Based on this observation, we feed a multi-modal input to SEANet (Sound EnhAncement Network), a wave-to-wave fully convolutional model, which adopts a combination of feature losses and adversarial losses to reconstruct an enhanced version of user's speech. We trained our model with data collected by sensors mounted on an earbud and synthetically corrupted by adding different kinds of noise sources to the audio signal. Our experimental results demonstrate that it is possible to achieve very high quality results, even in the case of interfering speech at the same level of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
