Discrete Tokens Exhibit Interlanguage Speech Intelligibility Benefit: an Analytical Study Towards Accent-robust ASR Only with Native Speech Data

Kentaro Onda; Keisuke Imoto; Satoru Fukayama; Daisuke Saito; Nobuaki Minematsu

arXiv:2505.16182·cs.SD·May 23, 2025

Discrete Tokens Exhibit Interlanguage Speech Intelligibility Benefit: an Analytical Study Towards Accent-robust ASR Only with Native Speech Data

Kentaro Onda, Keisuke Imoto, Satoru Fukayama, Daisuke Saito, Nobuaki Minematsu

PDF

Open Access

TL;DR

This paper demonstrates that discrete token-based ASR models trained solely on native speech data exhibit the interlanguage speech intelligibility benefit, improving recognition of non-native speech without requiring non-native data.

Contribution

It provides an analytical study showing that discrete tokens from SSL models can achieve accent-robust ASR by leveraging the ISIB phenomenon with only native speech data.

Findings

01

ISIB occurs in discrete token-based ASR

02

Native data training improves non-native speech recognition

03

Approach applicable to various accents with scarce data

Abstract

In this study, we gained insight that contributes to achieving accent-robust ASR using only native speech data. In human perception of non-native speech, the phenomenon known as "interlanguage speech intelligibility benefit" (ISIB) is observed, where non-native listeners who share the native language with the speaker understand the speech better compared even to native listeners. Based on the idea that discrete tokens extracted from self-supervised learning (SSL) models represent the human perception of speech, we conducted an analytical study on the robustness of discrete token-based ASR to non-native speech, varying the language used for training the tokenization, which is viewed as a technical implementation of ISIB. The results showed that ISIB actually occurred in the discrete token-based ASR. Since our approach relies only on native speech data to simulate the behavior of human…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPhonetics and Phonology Research · Speech Recognition and Synthesis · Hearing Loss and Rehabilitation