A Study of Incorporating Articulatory Movement Information in Speech Enhancement
Yu-Wen Chen, Kuo-Hsuan Hung, Shang-Yi Chuang, Jonathan Sherman, Xugang, Lu, Yu Tsao

TL;DR
This paper introduces a multimodal speech enhancement model that incorporates articulatory movement data alongside audio signals, significantly improving speech quality and intelligibility in challenging noise conditions.
Contribution
The study proposes a novel multimodal AAMSE model using articulatory features, demonstrating improved performance over audio-only methods under difficult noise scenarios.
Findings
AAMSE outperforms audio-only baselines in speech quality.
Combining modalities enhances intelligibility in noisy environments.
Limited sensors still provide notable improvements.
Abstract
Although deep learning algorithms are widely used for improving speech enhancement (SE) performance, the performance remains limited under highly challenging conditions, such as unseen noise or noise signals having low signal-to-noise ratios (SNRs). This study provides a pilot investigation on a novel multimodal audio-articulatory-movement SE (AAMSE) model to enhance SE performance under such challenging conditions. Articulatory movement features and acoustic signals were used as inputs to waveform-mapping-based and spectral-mapping-based SE systems with three fusion strategies. In addition, an ablation study was conducted to evaluate SE performance using a limited number of articulatory movement sensors. Experimental results confirm that, by combining the modalities, the AAMSE model notably improves the SE performance in terms of speech quality and intelligibility, as compared to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Indoor and Outdoor Localization Technologies · Advanced Adaptive Filtering Techniques
