Joint Beamforming and Speaker-Attributed ASR for Real Distant-Microphone Meeting Transcription
Can Cui, Imran Ahamad Sheikh, Mostafa Sadeghi (MULTISPEECH), Emmanuel Vincent (MULTISPEECH)

TL;DR
This paper presents a joint beamforming and speaker-attributed ASR system for improved distant-microphone meeting transcription, demonstrating significant WER reductions through joint optimization on real meeting data.
Contribution
It introduces a novel joint training approach combining neural beamforming with SA-ASR, enhancing performance over traditional methods.
Findings
Joint optimization reduces WER by up to 9%.
Pretraining neural beamformer on real data improves results.
Joint fine-tuning outperforms fixed beamforming approaches.
Abstract
Distant-microphone meeting transcription is a challenging task. State-of-the-art end-to-end speaker-attributed automatic speech recognition (SA-ASR) architectures lack a multichannel noise and reverberation reduction front-end, which limits their performance. In this paper, we introduce a joint beamforming and SA-ASR approach for real meeting transcription. We first describe a data alignment and augmentation method to pretrain a neural beamformer on real meeting data. We then compare fixed, hybrid, and fully neural beamformers as front-ends to the SA-ASR model. Finally, we jointly optimize the fully neural beamformer and the SA-ASR model. Experiments on the real AMI corpus show that, while state-of-the-art multi-frame cross-channel attention based channel fusion fails to improve ASR performance, fine-tuning SA-ASR on the fixed beamformer's output and jointly fine-tuning SA-ASR with the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing
MethodsSoftmax · Attention Is All You Need
