HuPER: A Human-Inspired Framework for Phonetic Perception

Chenxu Guo; Jiachen Lian; Yisi Liu; Baihe Huang; Shriyaa Narayanan; Cheol Jun Cho; Gopala Anumanchipalli

arXiv:2602.01634·eess.AS·February 3, 2026

HuPER: A Human-Inspired Framework for Phonetic Perception

Chenxu Guo, Jiachen Lian, Yisi Liu, Baihe Huang, Shriyaa Narayanan, Cheol Jun Cho, Gopala Anumanchipalli

PDF

Open Access 2 Models 1 Datasets

TL;DR

HuPER is a novel human-inspired framework that models phonetic perception as adaptive inference, achieving state-of-the-art results with limited data and strong zero-shot transfer across many languages.

Contribution

It introduces HuPER, the first framework enabling adaptive, multi-path phonetic perception under diverse acoustic conditions with minimal training data.

Findings

01

Achieves state-of-the-art phonetic error rates on five English benchmarks.

02

Demonstrates strong zero-shot transfer to 95 unseen languages.

03

Enables adaptive perception under diverse acoustic environments.

Abstract

We propose HuPER, a human-inspired framework that models phonetic perception as adaptive inference over acoustic-phonetics evidence and linguistic knowledge. With only 100 hours of training data, HuPER achieves state-of-the-art phonetic error rates on five English benchmarks and strong zero-shot transfer to 95 unseen languages. HuPER is also the first framework to enable adaptive, multi-path phonetic perception under diverse acoustic conditions. All training data, models, and code are open-sourced. Code and demo avaliable at https://github.com/HuPER29/HuPER.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

huper29/huper-clean100-proxyphones
dataset· 13 dl
13 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Phonetics and Phonology Research · Speech and Audio Processing