TL;DR
CrossASR++ is a modular, extensible framework that automates differential testing of ASR systems using TTS-generated audios, significantly improving failure detection efficiency and coverage over previous methods.
Contribution
We introduce CrossASR++, an enhanced, flexible testing tool that incorporates multiple TTS, ASR, and failure estimators, enabling more effective and scalable ASR system testing.
Findings
Discover 26.2% more failed cases with CrossASR++
Adding an extra ASR increases failure detection by up to 39.63%
Using advanced estimators improves failure detection by 10.41%
Abstract
Developers need to perform adequate testing to ensure the quality of Automatic Speech Recognition (ASR) systems. However, manually collecting required test cases is tedious and time-consuming. Our recent work proposes CrossASR, a differential testing method for ASR systems. This method first utilizes Text-to-Speech (TTS) to generate audios from texts automatically and then feed these audios into different ASR systems for cross-referencing to uncover failed test cases. It also leverages a failure estimator to find failing test cases more efficiently. Such a method is inherently self-improvable: the performance can increase by leveraging more advanced TTS and ASR systems. So in this accompanying tool demo paper, we devote more engineering and propose CrossASR++, an easy-to-use ASR testing tool that can be conveniently extended to incorporate different TTS and ASR systems, and failure…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
