ICPO: Illocution-Calibrated Policy Optimization for Multi-Turn Conversation

Zhebo Wang; Xiaohu Mu; Zijie Zhou; Mohan Li; Wenpeng Xing; Dezhang Kong; Meng Han

arXiv:2601.15330·cs.CL·January 23, 2026

ICPO: Illocution-Calibrated Policy Optimization for Multi-Turn Conversation

Zhebo Wang, Xiaohu Mu, Zijie Zhou, Mohan Li, Wenpeng Xing, Dezhang Kong, Meng Han

PDF

Open Access

TL;DR

This paper introduces ICPO, a training framework that improves multi-turn conversational AI by making models more sensitive to ambiguity, leading to more humble and effective interactions.

Contribution

ICPO is a novel training method that incorporates instruction ambiguity and illocutionary intent, enhancing model humility and robustness in multi-turn conversations.

Findings

01

75% average improvement in multi-turn conversation performance

02

Models exhibit increased humility and clarification-seeking behavior

03

Preserves performance on single-turn benchmarks

Abstract

Large Language Models (LLMs) in multi-turn conversations often suffer from a ``lost-in-conversation'' phenomenon, where they struggle to recover from early incorrect assumptions, particularly when users provide ambiguous initial instructions. We find that standard post-training techniques like Reinforcement Learning with Verifiable Rewards (RLVR) exacerbate this issue by rewarding confident, direct answers, thereby inducing overconfidence and discouraging the model from seeking clarification. To address this, we propose Illocution-Calibrated Policy Optimization (ICPO), a novel training framework that sensitizes the model to instruction ambiguity. ICPO augments the training corpus with underspecified prompts and conditions the reward signal on the user's illocutionary intent, rewarding the model for expressing uncertainty or asking for clarification when faced with ambiguity. Experiments…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Speech and dialogue systems · AI in Service Interactions