Speech Recognition With LLMs Adapted to Disordered Speech Using   Reinforcement Learning

Chirag Nagpal; Subhashini Venugopalan; Jimmy Tobin; Marilyn Ladewig,; Katherine Heller; Katrin Tomanek

arXiv:2501.00039·eess.AS·January 3, 2025

Speech Recognition With LLMs Adapted to Disordered Speech Using Reinforcement Learning

Chirag Nagpal, Subhashini Venugopalan, Jimmy Tobin, Marilyn Ladewig,, Katherine Heller, Katrin Tomanek

PDF

Open Access

TL;DR

This paper presents a novel approach to adapt large language models for speech recognition, especially disordered speech, using reinforcement learning with custom rewards, offering an alternative to traditional fine-tuning.

Contribution

It introduces a method combining audio token integration and reinforcement learning to improve LLM adaptation to disordered speech, surpassing supervised fine-tuning in certain settings.

Findings

01

Reinforcement learning with custom rewards improves adaptation to disordered speech.

02

The method outperforms supervised fine-tuning in specific speech recognition scenarios.

03

The approach offers a new tuning strategy for large language models in speech tasks.

Abstract

We introduce a large language model (LLM) capable of processing speech inputs and show that tuning it further with reinforcement learning on human preference (RLHF) enables it to adapt better to disordered speech than traditional fine-tuning. Our method replaces low-frequency text tokens in an LLM's vocabulary with audio tokens and enables the model to recognize speech by fine-tuning it on speech with transcripts. We then use RL with rewards based on syntactic and semantic accuracy measures generalizing the LLM further to recognize disordered speech. While the resulting LLM does not outperform existing systems for speech recognition, we find that tuning with reinforcement learning using custom rewards leads to substantially better performance than supervised fine-tuning of the language model, specifically when adapting to speech in a different setting. This presents a compelling…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis