MTLM: Incorporating Bidirectional Text Information to Enhance Language Model Training in Speech Recognition Systems

Qingliang Meng; Pengju Ren; Tian Li; Changsong Dai; and Huizhi Liang

arXiv:2502.10058·cs.CL·June 17, 2025

MTLM: Incorporating Bidirectional Text Information to Enhance Language Model Training in Speech Recognition Systems

Qingliang Meng, Pengju Ren, Tian Li, Changsong Dai, and Huizhi Liang

PDF

Open Access

TL;DR

This paper introduces MTLM, a training paradigm that combines unidirectional and bidirectional language modeling to improve speech recognition accuracy and flexibility in decoding strategies.

Contribution

MTLM unifies unidirectional and bidirectional training objectives, enabling richer linguistic representations while maintaining compatibility with existing ASR decoding methods.

Findings

01

MTLM outperforms traditional unidirectional models on LibriSpeech.

02

Supports multiple decoding strategies including shallow fusion and n-best rescoring.

03

Enhances ASR performance by capturing richer context information.

Abstract

Automatic speech recognition (ASR) systems normally consist of an acoustic model (AM) and a language model (LM). The acoustic model estimates the probability distribution of text given the input speech, while the language model calibrates this distribution toward a specific knowledge domain to produce the final transcription. Traditional ASR-specific LMs are typically trained in a unidirectional (left-to-right) manner to align with autoregressive decoding. However, this restricts the model from leveraging the right-side context during training, limiting its representational capacity. In this work, we propose MTLM, a novel training paradigm that unifies unidirectional and bidirectional manners through 3 training objectives: ULM, BMLM, and UMLM. This approach enhances the LM's ability to capture richer linguistic patterns from both left and right contexts while preserving compatibility…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · Natural Language Processing Techniques

MethodsALIGN