Low-Resource Domain Adaptation for Speech LLMs via Text-Only Fine-Tuning

Yangui Fang; Jing Peng; Xu Li; Yu Xi; Chengwei Zhang; Guohui Zhong; Kai Yu

arXiv:2506.05671·eess.AS·December 24, 2025

Low-Resource Domain Adaptation for Speech LLMs via Text-Only Fine-Tuning

Yangui Fang, Jing Peng, Xu Li, Yu Xi, Chengwei Zhang, Guohui Zhong, Kai Yu

PDF

Open Access

TL;DR

This paper introduces a text-only fine-tuning method for Speech LLMs that enables effective domain adaptation in low-resource settings without additional audio data, maintaining performance and avoiding catastrophic forgetting.

Contribution

It presents a novel text-only fine-tuning approach with real-time evaluation to adapt Speech LLMs to new domains using unpaired text data.

Findings

01

Achieves competitive recognition performance on multiple datasets.

02

Maintains source-domain performance with minimal degradation.

03

Enhances generalization to new domains without catastrophic forgetting.

Abstract

Recent advances in automatic speech recognition (ASR) have combined speech encoders with large language models (LLMs) through projection, forming Speech LLMs with strong performance. However, adapting them to new domains remains challenging, especially in low-resource settings where paired speech-text data is scarce. We propose a text-only fine-tuning strategy for Speech LLMs using unpaired target-domain text without requiring additional audio. To preserve speech-text alignment, we introduce a real-time evaluation mechanism during fine-tuning. This enables effective domain adaptation while maintaining source-domain performance. Experiments on LibriSpeech, SlideSpeech, and Medical datasets show that our method achieves competitive recognition performance, with minimal degradation compared to full audio-text fine-tuning. It also improves generalization to new domains without catastrophic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Topic Modeling · Domain Adaptation and Few-Shot Learning