Transsion Multilingual Speech Recognition System for MLC-SLM 2025 Challenge

Xiaoxiao Li; An Zhu; Youhai Jiang; Fengjie Zhu

arXiv:2508.14916·eess.AS·August 22, 2025

Transsion Multilingual Speech Recognition System for MLC-SLM 2025 Challenge

Xiaoxiao Li, An Zhu, Youhai Jiang, Fengjie Zhu

PDF

Open Access

TL;DR

This paper introduces a multilingual speech recognition system combining pretrained models and fine-tuning, achieving competitive error rates across 11 languages for the MLC-SLM 2025 Challenge.

Contribution

It presents a novel architecture integrating large pretrained speech and language models with trainable modules for multilingual ASR.

Findings

01

Achieved 9.83% WER/CER on evaluation set

02

Ranked third among global participants

03

Effective integration of pretrained models with task-specific adaptation

Abstract

This paper presents the architecture and performance of a novel Multilingual Automatic Speech Recognition (ASR) system developed by the Transsion Speech Team for Track 1 of the MLC-SLM 2025 Challenge. The proposed system comprises three key components: 1) a frozen Whisper-large-v3 based speech encoder, leveraging large-scale pretraining to ensure robust acoustic feature extraction; 2) a trainable adaptor module using Linear-ReLU-Linear transformation mechanisms to effectively align speech and text representations; and 3) a frozen Qwen2.5-7B-Instruct large language model (LLM) integrated with trainable LoRA for optimized contextual linguistic decoding. By systematically combining pretrained models with task specific fine-tuning, the system achieved a word/character error rate (WER/CER) of 9.83% across 11 languages in the evaluation set and ranked third place among global participants.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Face recognition and analysis