TICL+: A Case Study On Speech In-Context Learning for Children's Speech Recognition

Haolong Zheng; Yekaterina Yegorova; Mark Hasegawa-Johnson

arXiv:2512.18263·eess.AS·December 23, 2025

TICL+: A Case Study On Speech In-Context Learning for Children's Speech Recognition

Haolong Zheng, Yekaterina Yegorova, Mark Hasegawa-Johnson

PDF

Open Access

TL;DR

This paper introduces TICL+, an enhanced speech in-context learning method for children's speech recognition that combines semantic and acoustic example selection, significantly improving accuracy over previous approaches.

Contribution

The paper proposes TICL+, a novel extension of TICL that incorporates acoustic reranking, improving example selection for better children's speech recognition without fine-tuning.

Findings

01

TICL+ reduces word error rate by up to 53.3% relative to zero-shot.

02

TICL+ outperforms baseline TICL by 37.6%.

03

Combining semantic and acoustic information enhances ASR robustness.

Abstract

Children's speech recognition remains challenging due to substantial acoustic and linguistic variability, limited labeled data, and significant differences from adult speech. Speech foundation models can address these challenges through Speech In-Context Learning (SICL), allowing adaptation to new domains without fine-tuning. However, the effectiveness of SICL depends on how in-context examples are selected. We extend an existing retrieval-based method, Text-Embedding KNN for SICL (TICL), introducing an acoustic reranking step to create TICL+. This extension prioritizes examples that are both semantically and acoustically aligned with the test input. Experiments on four children's speech corpora show that TICL+ achieves up to a 53.3% relative word error rate reduction over zero-shot performance and 37.6% over baseline TICL, highlighting the value of combining semantic and acoustic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Language Development and Disorders · Speech and Audio Processing