BECTRA: Transducer-based End-to-End ASR with BERT-Enhanced Encoder
Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi, Shinji Watanabe

TL;DR
BECTRA is a novel end-to-end speech recognition model that integrates BERT-enhanced encoding with a transducer, effectively addressing vocabulary mismatch issues and improving recognition accuracy across diverse tasks.
Contribution
It introduces BECTRA, combining BERT-based encoding with a transducer and a new inference algorithm, advancing end-to-end ASR by handling vocabulary mismatch and leveraging BERT's linguistic knowledge.
Findings
BECTRA outperforms BERT-CTC in various ASR tasks.
Effective handling of vocabulary mismatch improves recognition accuracy.
Combines autoregressive and non-autoregressive decoding for better performance.
Abstract
We present BERT-CTC-Transducer (BECTRA), a novel end-to-end automatic speech recognition (E2E-ASR) model formulated by the transducer with a BERT-enhanced encoder. Integrating a large-scale pre-trained language model (LM) into E2E-ASR has been actively studied, aiming to utilize versatile linguistic knowledge for generating accurate text. One crucial factor that makes this integration challenging lies in the vocabulary mismatch; the vocabulary constructed for a pre-trained LM is generally too large for E2E-ASR training and is likely to have a mismatch against a target ASR domain. To overcome such an issue, we propose BECTRA, an extended version of our previous BERT-CTC, that realizes BERT-based E2E-ASR using a vocabulary of interest. BECTRA is a transducer-based model, which adopts BERT-CTC for its encoder and trains an ASR-specific decoder using a vocabulary suitable for a target task.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Topic Modeling
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Softmax · Adam · Refunds@Expedia|||How do I get a full refund from Expedia? · Attention Dropout · Dense Connections · WordPiece · Linear Warmup With Linear Decay
