Speechworthy Instruction-tuned Language Models

Hyundong Cho; Nicolaas Jedema; Leonardo F.R. Ribeiro; Karishma Sharma,; Pedro Szekely; Alessandro Moschitti; Ruben Janssen; Jonathan May

arXiv:2409.14672·cs.AI·September 24, 2024

Speechworthy Instruction-tuned Language Models

Hyundong Cho, Nicolaas Jedema, Leonardo F.R. Ribeiro, Karishma Sharma,, Pedro Szekely, Alessandro Moschitti, Ruben Janssen, Jonathan May

PDF

Open Access 1 Video

TL;DR

This paper enhances instruction-tuned language models for speech by developing new prompting strategies and a speech-based preference learning approach, significantly improving speech-suitability in generated responses through combined methods.

Contribution

It introduces a novel speech-based preference dataset and demonstrates that combining prompting and preference learning optimally improves speech-suitability of language models.

Findings

01

Combined prompting and preference learning achieve 76.2% win rate in preferences.

02

Both methods independently improve speech-suitability.

03

Analyses reveal distinct contributions of each method to response quality.

Abstract

Current instruction-tuned language models are exclusively trained with textual preference data and thus are often not aligned with the unique requirements of other modalities, such as speech. To better align language models with the speech domain, we explore (i) prompting strategies grounded in radio-industry best practices and (ii) preference learning using a novel speech-based preference data of 20K samples, generated with a wide spectrum of prompts that induce varying dimensions of speech-suitability and labeled by annotators who listen to response pairs. Both human and automatic evaluation show that both prompting and preference learning increase the speech-suitability of popular instruction-tuned LLMs. Interestingly, we find that prompting and preference learning can be additive; combining them achieves the best win rates in head-to-head comparison, resulting in responses that are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Speechworthy Instruction-tuned Language Models· underline

Taxonomy

TopicsSpeech and dialogue systems · Speech Recognition and Synthesis

MethodsBalanced Selection · ALIGN