K-Function: Joint Pronunciation Transcription and Feedback for Evaluating Kids Language Function

Shuhe Li; Chenxu Guo; Jiachen Lian; Cheol Jun Cho; Wenshuo Zhao; Xiner Xu; Ruiyu Jin; Xiaoyu Shi; Xuanru Zhou; Dingkun Zhou; Sam Wang; Grace Wang; Jingze Yang; Jingyi Xu; Ruohan Bao; Xingrui Chen; Elise Brenner; Brandon In; Francesca Pei; Maria Luisa Gorno-Tempini; Gopala Anumanchipalli

arXiv:2507.03043·cs.CL·February 25, 2026

K-Function: Joint Pronunciation Transcription and Feedback for Evaluating Kids Language Function

Shuhe Li, Chenxu Guo, Jiachen Lian, Cheol Jun Cho, Wenshuo Zhao, Xiner Xu, Ruiyu Jin, Xiaoyu Shi, Xuanru Zhou, Dingkun Zhou, Sam Wang, Grace Wang, Jingze Yang, Jingyi Xu, Ruohan Bao, Xingrui Chen, Elise Brenner, Brandon In, Francesca Pei, Maria Luisa Gorno-Tempini

PDF

Open Access

TL;DR

K-Function is a novel framework that combines phoneme-level transcription with LLM-based scoring to evaluate young children's language skills accurately, addressing challenges posed by child speech variability.

Contribution

It introduces K-WFST, a phoneme encoder with a similarity model, achieving low error rates and enabling effective LLM-driven assessment of children's language development.

Findings

01

Achieved 1.39% phoneme error rate on MyST dataset

02

Enabled LLM-based scoring that aligns with human evaluations

03

Improves scalability of language screening for children

Abstract

Evaluating young children's language is challenging for automatic speech recognizers due to high-pitched voices, prolonged sounds, and limited data. We introduce K-Function, a framework that combines accurate sub-word transcription with objective, Large Language Model (LLM)-driven scoring. Its core, Kids-Weighted Finite State Transducer (K-WFST), merges an acoustic phoneme encoder with a phoneme-similarity model to capture child-specific speech errors while remaining fully interpretable. K-WFST achieves a 1.39 % phoneme error rate on MyST and 8.61 % on Multitudes-an absolute improvement of 10.47 % and 7.06 % over a greedy-search decoder. These high-quality transcripts are used by an LLM to grade verbal skills, developmental milestones, reading, and comprehension, with results that align closely with human evaluators. Our findings show that precise phoneme recognition is essential for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPhonetics and Phonology Research