Do LLMs suffer from Multi-Party Hangover? A Diagnostic Approach to Addressee Recognition and Response Selection in Conversations

Nicol\`o Penzo; Maryam Sajedinia; Bruno Lepri; Sara Tonelli; Marco Guerini

arXiv:2409.18602·cs.CL·March 30, 2026

Do LLMs suffer from Multi-Party Hangover? A Diagnostic Approach to Addressee Recognition and Response Selection in Conversations

Nicol\`o Penzo, Maryam Sajedinia, Bruno Lepri, Sara Tonelli, Marco Guerini

PDF

1 Video

TL;DR

This paper introduces a diagnostic pipeline to evaluate LLM performance on addressee recognition and response selection in multi-party conversations, emphasizing structural and linguistic factors.

Contribution

It proposes a new methodology for analyzing model weaknesses across conversation structures and introduces diagnostic datasets for targeted evaluation.

Findings

01

Response selection depends more on textual content.

02

Addressee recognition requires understanding conversation structure.

03

LLMs show task-dependent sensitivity to prompt variations.

Abstract

Assessing the performance of systems to classify Multi-Party Conversations (MPC) is challenging due to the interconnection between linguistic and structural characteristics of conversations. Conventional evaluation methods often overlook variances in model behavior across different levels of structural complexity on interaction graphs. In this work, we propose a methodological pipeline to investigate model performance across specific structural attributes of conversations. As a proof of concept we focus on Response Selection and Addressee Recognition tasks, to diagnose model weaknesses. To this end, we extract representative diagnostic subdatasets with a fixed number of users and a good structural variety from a large and open corpus of online MPCs. We further frame our work in terms of data minimization, avoiding the use of original usernames to preserve privacy, and propose…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Do LLMs suffer from Multi-Party Hangover? A Diagnostic Approach to Addressee Recognition and Response Selection in Conversations· underline