Ideal-LLM: Integrating Dual Encoders and Language-Adapted LLM for Multilingual Speech-to-Text
Hongfei Xue, Wei Ren, Xuelong Geng, Kun Wei, Longhao Li, Qijie Shao,, Linju Yang, Kai Diao, Lei Xie

TL;DR
Ideal-LLM introduces dual encoders and language-adapted connectors to improve multilingual speech-to-text tasks, significantly enhancing recognition accuracy and translation quality by addressing language-specific adaptation.
Contribution
The paper presents a novel multilingual speech-to-text model that integrates dual encoders with language-specific adaptation mechanisms, improving over existing methods.
Findings
Achieves 32.6% relative reduction in word error rate for ASR.
Attains an average BLEU score of 36.78 for speech translation.
Utilizes complementary encoders and language weights for better multilingual representation.
Abstract
Integrating audio encoders with LLMs through connectors has enabled these models to process and comprehend audio modalities, significantly enhancing speech-to-text tasks, including automatic speech recognition (ASR) and automatic speech translation (AST). However, these methods often overlook the critical aspect of language adaptation in multilingual settings, relying instead on multilingual data without adequately addressing language differences. To address this gap, we propose the Ideal-LLM model, which employs dual multilingual encoders to enrich language feature information and utilizes a language-adapted connector to target the adaptation of each language specifically. By leveraging the complementary strengths of Whisper and MMS encoders, our approach ensures richer multilingual representations. Additionally, the language-adapted connector enhances modal transformation via a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
