CPPF: A contextual and post-processing-free model for automatic speech   recognition

Lei Zhang; Zhengkun Tian; Xiang Chen; Jiaming Sun; Hongyu Xiang; Ke; Ding; Guanglu Wan

arXiv:2309.07413·cs.CL·September 22, 2023

CPPF: A contextual and post-processing-free model for automatic speech recognition

Lei Zhang, Zhengkun Tian, Xiang Chen, Jiaming Sun, Hongyu Xiang, Ke, Ding, Guanglu Wan

PDF

Open Access

TL;DR

The paper introduces CPPF, a novel ASR model that integrates multiple post-processing tasks directly into the recognition process, eliminating the need for separate post-processing steps and reducing error propagation.

Contribution

This work presents CPPF, a unified ASR model that incorporates contextual and post-processing tasks, streamlining the pipeline and enhancing recognition accuracy without additional post-processing.

Findings

01

CPPF effectively integrates multiple tasks without performance loss.

02

The model reduces multi-stage pipeline complexity.

03

It prevents cascading errors in speech recognition.

Abstract

ASR systems have become increasingly widespread in recent years. However, their textual outputs often require post-processing tasks before they can be practically utilized. To address this issue, we draw inspiration from the multifaceted capabilities of LLMs and Whisper, and focus on integrating multiple ASR text processing tasks related to speech recognition into the ASR model. This integration not only shortens the multi-stage pipeline, but also prevents the propagation of cascading errors, resulting in direct generation of post-processed text. In this study, we focus on ASR-related processing tasks, including Contextual ASR and multiple ASR post processing tasks. To achieve this objective, we introduce the CPPF model, which offers a versatile and highly effective alternative to ASR processing. CPPF seamlessly integrates these tasks without any significant loss in recognition…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and dialogue systems

MethodsFocus