An Empirical Recipe for Universal Phone Recognition

Shikhar Bharadwaj; Chin-Jou Li; Kwanghee Choi; Eunjung Yeo; William Chen; Shinji Watanabe; David R. Mortensen

arXiv:2603.29042·cs.CL·April 2, 2026

An Empirical Recipe for Universal Phone Recognition

Shikhar Bharadwaj, Chin-Jou Li, Kwanghee Choi, Eunjung Yeo, William Chen, Shinji Watanabe, David R. Mortensen

PDF

2 Repos 1 Models

TL;DR

This paper introduces PhoneticXEUS, a multilingual phone recognition model trained on large-scale data, achieving state-of-the-art results and providing insights into factors affecting performance across languages and accents.

Contribution

It presents a new training recipe for multilingual PR, evaluates the impact of data scale, SSL representations, and loss objectives, and analyzes error patterns across diverse speech conditions.

Findings

01

Achieved 17.7% PFER on multilingual speech

02

Achieved 10.6% PFER on accented English

03

Quantified effects of data scale, SSL, and loss functions

Abstract

Phone recognition (PR) is a key enabler of multilingual and low-resource speech processing tasks, yet robust performance remains elusive. Highly performant English-focused models do not generalize across languages, while multilingual models underutilize pretrained representations. It also remains unclear how data scale, architecture, and training objective contribute to multilingual PR. We present PhoneticXEUS -- trained on large-scale multilingual data and achieving state-of-the-art performance on both multilingual (17.7% PFER) and accented English speech (10.6% PFER). Through controlled ablations with evaluations across 100+ languages under a unified scheme, we empirically establish our training recipe and quantify the impact of SSL representations, data scale, and loss objectives. In addition, we analyze error patterns across language families, accented speech, and articulatory…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
changelinglab/PhoneticXeus
model· ♡ 6
♡ 6

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.