The Performance Evaluation of Attention-Based Neural ASR under Mixed Speech Input
Bradley He, Martin Radfar

TL;DR
This paper evaluates the performance of attention-based neural ASR, specifically Listen, Attend, and Spell, under mixed speech conditions, revealing significant error rate increases at low target-to-interference ratios and insights into phoneme prediction behavior.
Contribution
It provides the first detailed analysis of LAS model performance on mixed speech signals and introduces a method to predict phonemes in cocktail party scenarios.
Findings
65% relative increase in PER at TIR=0 dB
Performance approaches original scenario at TIR=30 dB
Model predicts phonemes with higher accuracy during evaluation
Abstract
In order to evaluate the performance of the attention based neural ASR under noisy conditions, the current trend is to present hours of various noisy speech data to the model and measure the overall word/phoneme error rate (W/PER). In general, it is unclear how these models perform when exposed to a cocktail party setup in which two or more speakers are active. In this paper, we present the mixtures of speech signals to a popular attention-based neural ASR, known as Listen, Attend, and Spell (LAS), at different target-to-interference ratio (TIR) and measure the phoneme error rate. In particular, we investigate in details when two phonemes are mixed what will be the predicted phoneme; in this fashion we build a model in which the most probable predictions for a phoneme are given. We found a 65% relative increase in PER when LAS was presented with mixed speech signals at TIR = 0 dB and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Blind Source Separation Techniques
