MTLM: Incorporating Bidirectional Text Information to Enhance Language Model Training in Speech Recognition Systems
Qingliang Meng, Pengju Ren, Tian Li, Changsong Dai, and Huizhi Liang

TL;DR
This paper introduces MTLM, a training paradigm that combines unidirectional and bidirectional language modeling to improve speech recognition accuracy and flexibility in decoding strategies.
Contribution
MTLM unifies unidirectional and bidirectional training objectives, enabling richer linguistic representations while maintaining compatibility with existing ASR decoding methods.
Findings
MTLM outperforms traditional unidirectional models on LibriSpeech.
Supports multiple decoding strategies including shallow fusion and n-best rescoring.
Enhances ASR performance by capturing richer context information.
Abstract
Automatic speech recognition (ASR) systems normally consist of an acoustic model (AM) and a language model (LM). The acoustic model estimates the probability distribution of text given the input speech, while the language model calibrates this distribution toward a specific knowledge domain to produce the final transcription. Traditional ASR-specific LMs are typically trained in a unidirectional (left-to-right) manner to align with autoregressive decoding. However, this restricts the model from leveraging the right-side context during training, limiting its representational capacity. In this work, we propose MTLM, a novel training paradigm that unifies unidirectional and bidirectional manners through 3 training objectives: ULM, BMLM, and UMLM. This approach enhances the LM's ability to capture richer linguistic patterns from both left and right contexts while preserving compatibility…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Natural Language Processing Techniques
MethodsALIGN
