Benchmarking Neural Speech Codec Intelligibility with SITool

Anna Leschanowsky; Kishor Kayyar Lakshminarayana; Anjana Rajasekhar; Lyonel Behringer; Ibrahim Kilinc; Guillaume Fuchs; Emanu\"el A. P. Habets

arXiv:2506.01731·eess.AS·June 3, 2025·Interspeech

Benchmarking Neural Speech Codec Intelligibility with SITool

Anna Leschanowsky, Kishor Kayyar Lakshminarayana, Anjana Rajasekhar, Lyonel Behringer, Ibrahim Kilinc, Guillaume Fuchs, Emanu\"el A. P. Habets

PDF

Open Access

TL;DR

This paper introduces SITool, a web-based toolkit for standardized speech intelligibility testing, and benchmarks various neural and traditional speech codecs, revealing key insights into their performance and the effectiveness of objective metrics.

Contribution

The paper presents SITool, a novel toolkit for subjective intelligibility assessment, and provides a comprehensive benchmark of 13 speech codecs, analyzing correlations with objective metrics.

Findings

01

Neural codecs can outperform traditional codecs in subjective intelligibility.

02

STOI and ESTOI correlate with subjective results, WER does not.

03

Objective metrics struggle to capture gender and wordlist variations.

Abstract

Speech intelligibility assessment is essential for evaluating neural speech codecs, yet most evaluation efforts focus on overall quality rather than intelligibility. Only a few publicly available tools exist for conducting standardized intelligibility tests, like the Diagnostic Rhyme Test (DRT) and Modified Rhyme Test (MRT). We introduce the Speech Intelligibility Toolkit for Subjective Evaluation (SITool), a Flask-based web application for conducting DRT and MRT in laboratory and crowdsourcing settings. We use SITool to benchmark 13 neural and traditional speech codecs, analyzing phoneme-level degradations and comparing subjective DRT results with objective intelligibility metrics. Our findings show that, while neural speech codecs can outperform traditional ones in subjective intelligibility, only STOI and ESTOI - not WER - significantly correlate with subjective results, although…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Neural Networks and Applications

MethodsFocus