BART based semantic correction for Mandarin automatic speech recognition system
Yun Zhao, Xuerui Yang, Jinchao Wang, Yongyu Gao, Chao Yan, Yuanfu Zhou

TL;DR
This paper introduces a BART-based semantic correction method for Mandarin ASR that significantly reduces character errors and improves recognition quality beyond traditional metrics.
Contribution
It proposes a novel Transformer-based semantic correction approach initialized with pretrained BART, enhancing Mandarin speech recognition accuracy.
Findings
CER reduced by 21.7% relative to baseline
Expert evaluation confirms perceptible quality improvement
Effective on large-scale Mandarin speech dataset
Abstract
Although automatic speech recognition (ASR) systems achieved significantly improvements in recent years, spoken language recognition error occurs which can be easily spotted by human beings. Various language modeling techniques have been developed on post recognition tasks like semantic correction. In this paper, we propose a Transformer based semantic correction method with pretrained BART initialization, Experiments on 10000 hours Mandarin speech dataset show that character error rate (CER) can be effectively reduced by 21.7% relatively compared to our baseline ASR system. Expert evaluation demonstrates that actual improvement of our model surpasses what CER indicates.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsAttention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Adam · Dropout · Residual Connection · Label Smoothing · Byte Pair Encoding · Multi-Head Attention
