Aligning Spoken Dialogue Models from User Interactions

Anne Wu; Laurent Mazar\'e; Neil Zeghidour; Alexandre D\'efossez

arXiv:2506.21463·cs.CL·June 27, 2025

Aligning Spoken Dialogue Models from User Interactions

Anne Wu, Laurent Mazar\'e, Neil Zeghidour, Alexandre D\'efossez

PDF

Open Access 1 Video

TL;DR

This paper introduces a new framework for aligning spoken dialogue models using user interaction data, enhancing real-time speech conversations by incorporating rich dynamics and feedback.

Contribution

It presents a large-scale dataset and offline alignment method to fine-tune speech-to-speech models, improving factuality, safety, and contextual relevance in dialogue systems.

Findings

01

Feedback improves dialogue model performance

02

Large-scale annotated speech dataset created

03

Enhanced real-time speech interaction quality

Abstract

We propose a novel preference alignment framework for improving spoken dialogue models on real-time conversations from user interactions. Current preference learning methods primarily focus on text-based language models, and are not directly suited to the complexities of real-time speech interactions, with richer dynamics (e.g. interruption, interjection) and no explicit segmentation between speaker turns.We create a large-scale dataset of more than 150,000 preference pairs from raw multi-turn speech conversations, annotated with AI feedback, to cover preferences over both linguistic content and temporal context variations. We leverage offline alignment methods to finetune a full-duplex autoregressive speech-to-speech model. Extensive experiments demonstrate that feedback on generic conversations can be consistently effective in improving spoken dialogue models to produce more factual,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Aligning Spoken Dialogue Models from User Interactions· slideslive

Taxonomy

TopicsSpeech and dialogue systems · Topic Modeling · Multi-Agent Systems and Negotiation

MethodsFocus