Evaluating Automatic Speech Recognition Systems in Comparison With Human   Perception Results Using Distinctive Feature Measures

Xiang Kong; Jeung-Yoon Choi; Stefanie Shattuck-Hufnagel

arXiv:1612.03990·cs.CL·December 14, 2016

Evaluating Automatic Speech Recognition Systems in Comparison With Human Perception Results Using Distinctive Feature Measures

Xiang Kong, Jeung-Yoon Choi, Stefanie Shattuck-Hufnagel

PDF

Open Access

TL;DR

This paper introduces a novel evaluation approach for ASR systems that compares their error patterns with human perception using distinctive feature measures, offering detailed insights beyond traditional metrics.

Contribution

It presents a new method for evaluating ASR systems through distinctive feature-based analysis, enabling detailed comparison with human perception at the sub-phonemic level.

Findings

01

Error patterns in manner, place, and voicing are analyzed.

02

Confusion matrices are examined using a distinctive-feature-distance metric.

03

The method provides a detailed performance profile of ASR systems.

Abstract

This paper describes methods for evaluating automatic speech recognition (ASR) systems in comparison with human perception results, using measures derived from linguistic distinctive features. Error patterns in terms of manner, place and voicing are presented, along with an examination of confusion matrices via a distinctive-feature-distance metric. These evaluation methods contrast with conventional performance criteria that focus on the phone or word level, and are intended to provide a more detailed profile of ASR system performance,as well as a means for direct comparison with human perception results at the sub-phonemic level.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing