FireRedASR2S: A State-of-the-Art Industrial-Grade All-in-One Automatic Speech Recognition System

Kaituo Xu; Yan Jia; Kai Huang; Junjie Chen; Wenpeng Li; Kun Liu; Feng-Long Xie; Xu Tang; Yao Hu

arXiv:2603.10420·eess.AS·March 12, 2026

FireRedASR2S: A State-of-the-Art Industrial-Grade All-in-One Automatic Speech Recognition System

Kaituo Xu, Yan Jia, Kai Huang, Junjie Chen, Wenpeng Li, Kun Liu, Feng-Long Xie, Xu Tang, Yao Hu

PDF

Open Access 5 Models

TL;DR

FireRedASR2S is an integrated, industrial-grade speech recognition system that combines multiple modules to achieve state-of-the-art accuracy across languages, dialects, and tasks, with open-source release.

Contribution

The paper introduces FireRedASR2S, a comprehensive all-in-one ASR system with new modules supporting multiple languages, dialects, and tasks, outperforming existing baselines.

Findings

01

Achieves 2.89% CER on Mandarin benchmarks.

02

VAD module with 97.57% frame-level F1 score.

03

LID module with 97.18% accuracy on 82 languages.

Abstract

We present FireRedASR2S, a state-of-the-art industrial-grade all-in-one automatic speech recognition (ASR) system. It integrates four modules in a unified pipeline: ASR, Voice Activity Detection (VAD), Spoken Language Identification (LID), and Punctuation Prediction (Punc). All modules achieve SOTA performance on the evaluated benchmarks: FireRedASR2: An ASR module with two variants, FireRedASR2-LLM (8B+ parameters) and FireRedASR2-AED (1B+ parameters), supporting speech and singing transcription for Mandarin, Chinese dialects and accents, English, and code-switching. Compared to FireRedASR, FireRedASR2 delivers improved recognition accuracy and broader dialect and accent coverage. FireRedASR2-LLM achieves 2.89% average CER on 4 public Mandarin benchmarks and 11.55% on 19 public Chinese dialects and accents benchmarks, outperforming competitive baselines including Doubao-ASR, Qwen3-ASR,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Voice and Speech Disorders · Speech and Audio Processing