Joint Learning of Context and Feedback Embeddings in Spoken Dialogue
Livia Qian, Gabriel Skantze

TL;DR
This paper introduces a contrastive learning approach to embed dialogue contexts and feedback responses in a shared space, improving feedback appropriateness ranking in spoken dialogue systems.
Contribution
It proposes a novel joint embedding method for context and feedback responses, enhancing the understanding of their conversational function.
Findings
Model outperforms humans in feedback ranking tasks.
Embeddings encode feedback responses' conversational function.
Improves feedback response appropriateness assessment.
Abstract
Short feedback responses, such as backchannels, play an important role in spoken dialogue. So far, most of the modeling of feedback responses has focused on their timing, often neglecting how their lexical and prosodic form influence their contextual appropriateness and conversational function. In this paper, we investigate the possibility of embedding short dialogue contexts and feedback responses in the same representation space using a contrastive learning objective. In our evaluation, we primarily focus on how such embeddings can be used as a context-feedback appropriateness metric and thus for feedback response ranking in U.S. English dialogues. Our results show that the model outperforms humans given the same ranking task and that the learned embeddings carry information about the conversational function of feedback responses.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems
MethodsFocus · Contrastive Learning
