Exploration of Adapter for Noise Robust Automatic Speech Recognition
Hao Shi, Tatsuya Kawahara

TL;DR
This paper investigates the use of adapters for noise-robust automatic speech recognition, demonstrating that shallow layer adapters improve performance, especially with real data, and that integration with speech enhancement systems is beneficial.
Contribution
It provides a comprehensive analysis of adapter-based adaptation in noisy ASR, highlighting the effectiveness of shallow layer adapters and the benefits of real data and speech enhancement integration.
Findings
Shallow layer adapters outperform other configurations.
Real data is more effective than simulated data for adaptation.
Adapters combined with speech enhancement significantly improve ASR performance.
Abstract
Adapting an automatic speech recognition (ASR) system to unseen noise environments is crucial. Integrating adapters into neural networks has emerged as a potent technique for transfer learning. This study thoroughly investigates adapter-based ASR adaptation in noisy environments. We conducted experiments using the CHiME--4 dataset. The results show that inserting the adapter in the shallow layer yields superior effectiveness, and there is no significant difference between adapting solely within the shallow layer and adapting across all layers. The simulated data helps the system to improve its performance under real noise conditions. Nonetheless, when the amount of data is the same, the real data is more effective than the simulated data. Multi-condition training is still useful for adapter training. Furthermore, integrating adapters into speech enhancement-based ASR systems yields…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis
MethodsAdapter
