Multimodal Dyadic Impression Recognition via Listener Adaptive   Cross-Domain Fusion

Yuanchao Li; Peter Bell; Catherine Lai

arXiv:2211.05163·cs.MM·February 17, 2023

Multimodal Dyadic Impression Recognition via Listener Adaptive Cross-Domain Fusion

Yuanchao Li, Peter Bell, Catherine Lai

PDF

Open Access

TL;DR

This paper introduces a listener adaptive cross-domain fusion approach for dyadic impression recognition, effectively modeling speaker-listener interactions to improve perception accuracy in conversational AI.

Contribution

It proposes a novel listener adaptive cross-domain architecture that captures the causal relationship between speaker and listener behaviors, enhancing impression recognition.

Findings

01

Achieved 78.8% and 77.5% concordance correlation coefficients in competence and warmth.

02

Outperformed previous methods on the dyadic IMPRESSION dataset.

03

Demonstrated potential for generalization to similar dyadic interactions.

Abstract

As a sub-branch of affective computing, impression recognition, e.g., perception of speaker characteristics such as warmth or competence, is potentially a critical part of both human-human conversations and spoken dialogue systems. Most research has studied impressions only from the behaviors expressed by the speaker or the response from the listener, yet ignored their latent connection. In this paper, we perform impression recognition using a proposed listener adaptive cross-domain architecture, which consists of a listener adaptation function to model the causality between speaker and listener behaviors and a cross-domain fusion function to strengthen their connection. The experimental evaluation on the dyadic IMPRESSION dataset verified the efficacy of our method, producing concordance correlation coefficients of 78.8% and 77.5% in the competence and warmth dimensions, outperforming…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · Speech and Audio Processing · Emotion and Mood Recognition