An LLM Benchmark for Addressee Recognition in Multi-modal Multi-party   Dialogue

Koji Inoue; Divesh Lala; Mikey Elmers; Keiko Ochi; Tatsuya Kawahara

arXiv:2501.16643·cs.CL·March 19, 2025

An LLM Benchmark for Addressee Recognition in Multi-modal Multi-party Dialogue

Koji Inoue, Divesh Lala, Mikey Elmers, Keiko Ochi, Tatsuya Kawahara

PDF

Open Access

TL;DR

This paper introduces a new multi-modal multi-party dialogue corpus and benchmarks addressee recognition, revealing significant challenges for large language models like GPT-4o in understanding multi-party conversations.

Contribution

It presents a novel multi-party dialogue dataset with addressee annotations and evaluates LLM performance, highlighting gaps in current models' understanding of multi-party interactions.

Findings

01

Explicit addressees appear in about 20% of turns.

02

GPT-4o's accuracy is only slightly above chance.

03

Current models struggle with multi-party dialogue comprehension.

Abstract

Handling multi-party dialogues represents a significant step for advancing spoken dialogue systems, necessitating the development of tasks specific to multi-party interactions. To address this challenge, we are constructing a multi-modal multi-party dialogue corpus of triadic (three-participant) discussions. This paper focuses on the task of addressee recognition, identifying who is being addressed to take the next turn, a critical component unique to multi-party dialogue systems. A subset of the corpus was annotated with addressee information, revealing that explicit addressees are indicated in approximately 20% of conversational turns. To evaluate the task's complexity, we benchmarked the performance of a large language model (GPT-4o) on addressee recognition. The results showed that GPT-4o achieved an accuracy only marginally above chance, underscoring the challenges of addressee…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems