TL;DR
MooER is a large-scale LLM-based speech recognition and translation model trained on 5000 hours of pseudo-labeled data, achieving performance comparable to models trained on much larger datasets, with plans for open-source release.
Contribution
This paper introduces a novel training strategy for speech tasks using limited pseudo-labeled data without manual annotation, and releases the models and training code.
Findings
Achieves BLEU score of 25.2 on Covost2 Zh2en testset.
Performs comparably to models trained on hundreds of thousands of hours.
Outperforms other open source Speech LLMs.
Abstract
In this paper, we present MooER, a LLM-based large-scale automatic speech recognition (ASR) / automatic speech translation (AST) model of Moore Threads. A 5000h pseudo labeled dataset containing open source and self collected speech data is used for training. We achieve performance comparable to other open source models trained with up to hundreds of thousands of hours of labeled speech data. Meanwhile, experiments conducted on Covost2 Zh2en testset suggest that our model outperforms other open source Speech LLMs. A BLEU score of 25.2 can be obtained. The main contributions of this paper are summarized as follows. First, this paper presents a training strategy for encoders and LLMs on speech related tasks (including ASR and AST) using a small size of pseudo labeled data without any extra manual annotation and selection. Second, we release our ASR and AST models and plan to open-source…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
