CUPE: Contextless Universal Phoneme Encoder for Language-Agnostic Speech Processing

Abdul Rehman; Jian-Jun Zhang; and Xiaosong Yang

arXiv:2508.15316·cs.CL·August 22, 2025

CUPE: Contextless Universal Phoneme Encoder for Language-Agnostic Speech Processing

Abdul Rehman, Jian-Jun Zhang, and Xiaosong Yang

PDF

Open Access 1 Models

TL;DR

CUPE is a lightweight, language-agnostic phoneme encoder that captures essential phonetic features in short segments, enabling effective cross-lingual speech processing with fewer parameters.

Contribution

The paper introduces CUPE, a novel model that processes short speech windows independently to learn universal phoneme features across languages, requiring less data and computational resources.

Findings

01

Achieves competitive cross-lingual performance

02

Effective in zero-shot language transfer

03

Models fundamental acoustic patterns within phoneme-length windows

Abstract

Universal phoneme recognition typically requires analyzing long speech segments and language-specific patterns. Many speech processing tasks require pure phoneme representations free from contextual influence, which motivated our development of CUPE - a lightweight model that captures key phoneme features in just 120 milliseconds, about one phoneme's length. CUPE processes short, fixed-width windows independently and, despite fewer parameters than current approaches, achieves competitive cross-lingual performance by learning fundamental acoustic patterns common to all languages. Our extensive evaluation through supervised and self-supervised training on diverse languages, including zero-shot tests on the UCLA Phonetic Corpus, demonstrates strong cross-lingual generalization and reveals that effective universal speech processing is possible through modeling basic acoustic patterns within…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
Tabahi/CUPE-2i
model· ♡ 7
♡ 7

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Phonetics and Phonology Research · Music and Audio Processing