SayNext-Bench: Why Do LLMs Struggle with Next-Utterance Anticipation?

Yueyi Yang; Haotian Liu; Fang Kang; Mengqi Zhang; Zheng Lian; Hao Tang; Haoyu Chen

arXiv:2602.00327·cs.AI·May 12, 2026

SayNext-Bench: Why Do LLMs Struggle with Next-Utterance Anticipation?

Yueyi Yang, Haotian Liu, Fang Kang, Mengqi Zhang, Zheng Lian, Hao Tang, Haoyu Chen

PDF

TL;DR

This paper introduces SayNext-Bench, a benchmark and dataset for evaluating large language models on next-utterance anticipation in dialogue, highlighting the importance of multimodal cues and proposing a new cognitively inspired model.

Contribution

It presents a new benchmark, dataset, and a dual-route MLLM that incorporates perceptual cues, advancing the understanding of multimodal anticipation in dialogue models.

Findings

01

SayNext-Chat outperforms state-of-the-art MLLMs across evaluation levels.

02

Multimodal cues significantly improve next-utterance anticipation.

03

Active anticipatory processing is crucial for natural human-like dialogue.

Abstract

We explore the use of large language models (LLMs) for next-utterance anticipation in human dialogue. Despite recent advances in LLMs demonstrating their ability to engage in natural conversations with users, we show that even leading models surprisingly struggle to anticipate a human speaker's next utterance. Instead, humans can readily anticipate forthcoming utterances based on multi-modal cues -- such as gestures, gaze, and emotional tone -- from the context. To systematically examine this gap, we propose SayNext-Bench, a benchmark evaluating MLLMs on anticipating context-conditioned responses across diverse real-world scenarios. To support it, we build SayNext-PC, a large-scale multimodal dialogue dataset, and carefully design a multi-level evaluation framework spanning lexical similarity, emotion-intention consistency, and LLM-based overall alignment. Building on this, we develop…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.