MT-PingEval: Evaluating Multi-Turn Collaboration with Private Information Games
Jacob Eisenstein, Fantine Huot, Adam Fisch, Jonathan Berant, Mirella Lapata

TL;DR
This paper introduces MT-PingEval, a scalable framework for assessing how well language models perform in multi-turn collaborative games involving private information, revealing current limitations in their conversational planning and coherence.
Contribution
It proposes a new evaluation methodology for multi-turn collaboration with private info, highlighting weaknesses in models' planning and dialogue coherence compared to humans.
Findings
Models often fail to improve over non-interactive baselines in collaborative tasks.
Humans produce more coherent and efficient dialogues than current models.
Models show significant weaknesses in managing private information proactively.
Abstract
We present a scalable methodology for evaluating language models in multi-turn interactions, using a suite of collaborative games that require effective communication about private information. This enables an interactive scaling analysis, in which a fixed token budget is divided over a variable number of turns. We find that in many cases, language models are unable to use interactive collaboration to improve over the non-interactive baseline scenario in which one agent attempts to summarize its information and the other agent immediately acts -- despite substantial headroom. This suggests that state-of-the-art models still suffer from significant weaknesses in planning and executing multi-turn collaborative conversations. We analyze the linguistic features of these dialogues, assessing the roles of sycophancy, information density, and discourse coherence. While there is no single…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Language and cultural evolution · Speech and dialogue systems
