Benchmarking Neural Speech Codec Intelligibility with SITool
Anna Leschanowsky, Kishor Kayyar Lakshminarayana, Anjana Rajasekhar, Lyonel Behringer, Ibrahim Kilinc, Guillaume Fuchs, Emanu\"el A. P. Habets

TL;DR
This paper introduces SITool, a web-based toolkit for standardized speech intelligibility testing, and benchmarks various neural and traditional speech codecs, revealing key insights into their performance and the effectiveness of objective metrics.
Contribution
The paper presents SITool, a novel toolkit for subjective intelligibility assessment, and provides a comprehensive benchmark of 13 speech codecs, analyzing correlations with objective metrics.
Findings
Neural codecs can outperform traditional codecs in subjective intelligibility.
STOI and ESTOI correlate with subjective results, WER does not.
Objective metrics struggle to capture gender and wordlist variations.
Abstract
Speech intelligibility assessment is essential for evaluating neural speech codecs, yet most evaluation efforts focus on overall quality rather than intelligibility. Only a few publicly available tools exist for conducting standardized intelligibility tests, like the Diagnostic Rhyme Test (DRT) and Modified Rhyme Test (MRT). We introduce the Speech Intelligibility Toolkit for Subjective Evaluation (SITool), a Flask-based web application for conducting DRT and MRT in laboratory and crowdsourcing settings. We use SITool to benchmark 13 neural and traditional speech codecs, analyzing phoneme-level degradations and comparing subjective DRT results with objective intelligibility metrics. Our findings show that, while neural speech codecs can outperform traditional ones in subjective intelligibility, only STOI and ESTOI - not WER - significantly correlate with subjective results, although…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Neural Networks and Applications
MethodsFocus
