E-Branchformer: Branchformer with Enhanced merging for speech recognition
Kwangyoun Kim, Felix Wu, Yifan Peng, Jing Pan, Prashant Sridhar, Kyu, J. Han, Shinji Watanabe

TL;DR
E-Branchformer improves upon Branchformer by enhancing the merging process and stacking modules, achieving state-of-the-art speech recognition performance on LibriSpeech without external data.
Contribution
The paper introduces E-Branchformer, a novel model that enhances Branchformer with better merging techniques and additional modules for improved ASR accuracy.
Findings
Achieves new state-of-the-art WERs of 1.81% and 3.65% on LibriSpeech test sets.
Outperforms previous models without external training data.
Demonstrates the effectiveness of enhanced merging and stacking in speech recognition.
Abstract
Conformer, combining convolution and self-attention sequentially to capture both local and global information, has shown remarkable performance and is currently regarded as the state-of-the-art for automatic speech recognition (ASR). Several other studies have explored integrating convolution and self-attention but they have not managed to match Conformer's performance. The recently introduced Branchformer achieves comparable performance to Conformer by using dedicated branches of convolution and self-attention and merging local and global context from each branch. In this paper, we propose E-Branchformer, which enhances Branchformer by applying an effective merging method and stacking additional point-wise modules. E-Branchformer sets new state-of-the-art word error rates (WERs) 1.81% and 3.65% on LibriSpeech test-clean and test-other sets without using any external training data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗espnet/xeusmodel· 33 dl· ♡ 14633 dl♡ 146
- 🤗pyf98/tedlium2_e_branchformermodel· 2 dl2 dl
- 🤗pyf98/librispeech_100_e_branchformermodel· 1 dl· ♡ 11 dl♡ 1
- 🤗pyf98/aishell_e_branchformermodel· 6 dl6 dl
- 🤗pyf98/aidatatang_200zh_e_branchformermodel· 1 dl1 dl
- 🤗pyf98/swbd_e_branchformermodel· 1 dl1 dl
- 🤗pyf98/wsj_e_branchformermodel· 2 dl2 dl
- 🤗pyf98/chime4_e_branchformer_e10model· 4 dl4 dl
- 🤗pyf98/voxforge_it_e_branchformermodel· 1 dl1 dl
- 🤗pyf98/tedlium2_ctc_e_branchformermodel· 1 dl1 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
MethodsE-Branchformer
