Seewo's Submission to MLC-SLM: Lessons learned from Speech Reasoning Language Models
Bo Li, Chengben Xu, Wufeng Zhang

TL;DR
Seewo's submission for the MLC-SLM challenge introduces a multi-stage training pipeline with curriculum learning, data augmentation, and reinforcement learning to improve speech recognition and diarization, achieving state-of-the-art results.
Contribution
The paper presents a novel multi-stage training approach incorporating curriculum learning, Chain-of-Thought augmentation, and RLVR for enhanced speech reasoning and self-correction.
Findings
Achieved WER of 11.57% and CER of 17.67% on challenge datasets.
Demonstrated the effectiveness of each training component through ablation studies.
Significantly outperformed official challenge baselines.
Abstract
This paper presents Seewo's systems for both tracks of the Multilingual Conversational Speech Language Model Challenge (MLC-SLM), addressing automatic speech recognition (ASR) and speaker diarization with ASR (SD-ASR). We introduce a multi-stage training pipeline that explicitly enhances reasoning and self-correction in speech language models for ASR. Our approach combines curriculum learning for progressive capability acquisition, Chain-of-Thought data augmentation to foster intermediate reflection, and Reinforcement Learning with Verifiable Rewards (RLVR) to further refine self-correction through reward-driven optimization. This approach achieves substantial improvements over the official challenge baselines. On the evaluation set, our best system attains a WER/CER of 11.57% for Track 1 and a tcpWER/tcpCER of 17.67% for Track 2. Comprehensive ablation studies demonstrate the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Natural Language Processing Techniques
