Two-Stage Adaptation for Non-Normative Speech Recognition: Revisiting Speaker-Independent Initialization for Personalization

Shan Jiang; Jiawen Qi; Chuanbing Huo; Yingqiang Gao; Qinyu Chen

arXiv:2603.15261·cs.SD·March 17, 2026

Two-Stage Adaptation for Non-Normative Speech Recognition: Revisiting Speaker-Independent Initialization for Personalization

Shan Jiang, Jiawen Qi, Chuanbing Huo, Yingqiang Gao, Qinyu Chen

PDF

Open Access

TL;DR

This paper introduces a two-stage adaptation method for non-normative speech recognition, improving personalization by leveraging speaker-independent fine-tuning before speaker-specific fine-tuning, with consistent performance gains across multiple datasets.

Contribution

It proposes a novel two-stage adaptation framework that enhances speaker personalization in non-normative ASR by revisiting the initialization strategy, outperforming direct speaker-specific fine-tuning.

Findings

01

Two-stage adaptation improves personalization accuracy.

02

The method maintains manageable out-of-domain trade-offs.

03

Consistent improvements across multiple datasets.

Abstract

Personalizing automatic speech recognition (ASR) systems for non-normative speech, such as dysarthric and aphasic speech, is challenging. While speaker-specific fine-tuning (SS-FT) is widely used, it is typically initialized directly from a generic pre-trained model. Whether speaker-independent adaptation provides a stronger initialization prior under such mismatch remains unclear. In this work, we propose a two-stage adaptation framework consisting of speaker-independent fine-tuning (SI-FT) on multi-speaker non-normative data followed by SS-FT, and evaluate it through a controlled comparison with direct SS-FT under identical per-speaker conditions. Experiments on AphasiaBank and UA-Speech with Whisper-Large-v3 and Qwen3-ASR, alongside evaluation on typical-speech datasets TED-LIUM v3 and FLEURS, show that two-stage adaptation consistently improves personalization while maintaining…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Voice and Speech Disorders · Phonetics and Phonology Research