FireRedASR: Open-Source Industrial-Grade Mandarin Speech Recognition   Models from Encoder-Decoder to LLM Integration

Kai-Tuo Xu; Feng-Long Xie; Xu Tang; Yao Hu

arXiv:2501.14350·eess.AS·January 27, 2025

FireRedASR: Open-Source Industrial-Grade Mandarin Speech Recognition Models from Encoder-Decoder to LLM Integration

Kai-Tuo Xu, Feng-Long Xie, Xu Tang, Yao Hu

PDF

Open Access 1 Repo 6 Models

TL;DR

FireRedASR introduces large-scale Mandarin speech recognition models, including an LLM-integrated variant for high accuracy and an efficient encoder-decoder model, achieving state-of-the-art results and broad applicability.

Contribution

The paper presents FireRedASR, a new family of Mandarin ASR models with LLM integration and efficient architecture, surpassing existing SOTA performance and supporting diverse speech recognition scenarios.

Findings

01

FireRedASR-LLM achieves 3.05% CER, surpassing SOTA by 8.4%.

02

FireRedASR-AED achieves 3.18% CER, outperforming larger models.

03

Both models perform well on dialects, English speech, and singing lyrics.

Abstract

We present FireRedASR, a family of large-scale automatic speech recognition (ASR) models for Mandarin, designed to meet diverse requirements in superior performance and optimal efficiency across various applications. FireRedASR comprises two variants: FireRedASR-LLM: Designed to achieve state-of-the-art (SOTA) performance and to enable seamless end-to-end speech interaction. It adopts an Encoder-Adapter-LLM framework leveraging large language model (LLM) capabilities. On public Mandarin benchmarks, FireRedASR-LLM (8.3B parameters) achieves an average Character Error Rate (CER) of 3.05%, surpassing the latest SOTA of 3.33% with an 8.4% relative CER reduction (CERR). It demonstrates superior generalization capability over industrial-grade baselines, achieving 24%-40% CERR in multi-source Mandarin ASR scenarios such as video, live, and intelligent assistant. FireRedASR-AED: Designed to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fireredteam/fireredasr
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis