Evaluating Automatic Speech Recognition Systems in Comparison With Human Perception Results Using Distinctive Feature Measures
Xiang Kong, Jeung-Yoon Choi, Stefanie Shattuck-Hufnagel

TL;DR
This paper introduces a novel evaluation approach for ASR systems that compares their error patterns with human perception using distinctive feature measures, offering detailed insights beyond traditional metrics.
Contribution
It presents a new method for evaluating ASR systems through distinctive feature-based analysis, enabling detailed comparison with human perception at the sub-phonemic level.
Findings
Error patterns in manner, place, and voicing are analyzed.
Confusion matrices are examined using a distinctive-feature-distance metric.
The method provides a detailed performance profile of ASR systems.
Abstract
This paper describes methods for evaluating automatic speech recognition (ASR) systems in comparison with human perception results, using measures derived from linguistic distinctive features. Error patterns in terms of manner, place and voicing are presented, along with an examination of confusion matrices via a distinctive-feature-distance metric. These evaluation methods contrast with conventional performance criteria that focus on the phone or word level, and are intended to provide a more detailed profile of ASR system performance,as well as a means for direct comparison with human perception results at the sub-phonemic level.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
