On the Use of Audio to Improve Dialogue Policies

Daniel Roncel; Federico Costa; Javier Hernando

arXiv:2410.13385·eess.AS·October 18, 2024

On the Use of Audio to Improve Dialogue Policies

Daniel Roncel, Federico Costa, Javier Hernando

PDF

Open Access 1 Repo

TL;DR

This paper introduces new dialogue policy architectures that integrate audio and text embeddings using Double Multi-Head Attention, significantly enhancing performance especially in noisy transcription environments.

Contribution

It proposes novel architectures combining speech and text embeddings with Double Multi-Head Attention for improved dialogue policies.

Findings

01

Audio embedding-aware policies outperform text-only models.

02

9.8% relative improvement in User Request Score on DSTC2.

03

Combining text and audio embeddings effectively enhances robustness.

Abstract

With the significant progress of speech technologies, spoken goal-oriented dialogue systems are becoming increasingly popular. One of the main modules of a dialogue system is typically the dialogue policy, which is responsible for determining system actions. This component usually relies only on audio transcriptions, being strongly dependent on their quality and ignoring very important extralinguistic information embedded in the user's speech. In this paper, we propose new architectures to add audio information by combining speech and text embeddings using a Double Multi-Head Attention component. Our experiments show that audio embedding-aware dialogue policies outperform text-based ones, particularly in noisy transcription scenarios, and that how text and audio embeddings are combined is crucial to improve performance. We obtained a 9.8% relative improvement in the User Request Score…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

danielroncel/tfm
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsConflict Management and Negotiation · Team Dynamics and Performance · Language, Discourse, Communication Strategies

MethodsAttention Is All You Need · Linear Layer · Softmax · Multi-Head Attention