Dream to Chat: Model-based Reinforcement Learning on Dialogues with User Belief Modeling

Yue Zhao; Xiaoyu Wang; Dan Wang; Zhonglin Jiang; Qingqing Gu; Teng Chen; Ningyuan Xi; Jinxian Qu; Yong Chen; Luo Ji

arXiv:2508.16876·cs.CL·September 29, 2025

Dream to Chat: Model-based Reinforcement Learning on Dialogues with User Belief Modeling

Yue Zhao, Xiaoyu Wang, Dan Wang, Zhonglin Jiang, Qingqing Gu, Teng Chen, Ningyuan Xi, Jinxian Qu, Yong Chen, Luo Ji

PDF

1 Video

TL;DR

This paper introduces DreamCUB, a model-based reinforcement learning framework utilizing a dialogue world model to predict user beliefs and improve dialogue quality, achieving state-of-the-art results in emotion and sentiment tasks.

Contribution

It develops a novel dialogue world model that predicts user beliefs and integrates it into a reinforcement learning framework for enhanced dialogue systems.

Findings

01

State-of-the-art emotion classification accuracy

02

Improved sentiment identification performance

03

Enhanced dialogue quality through joint training

Abstract

World models have been widely utilized in robotics, gaming, and auto-driving. However, their applications on natural language tasks are relatively limited. In this paper, we construct the dialogue world model, which could predict the user's emotion, sentiment, and intention, and future utterances. By defining a POMDP, we argue emotion, sentiment and intention can be modeled as the user belief and solved by maximizing the information bottleneck. By this user belief modeling, we apply the model-based reinforcement learning framework to the dialogue system, and propose a framework called DreamCUB. Experiments show that the pretrained dialogue world model can achieve state-of-the-art performances on emotion classification and sentiment identification, while dialogue quality is also enhanced by joint training of the policy, critic and dialogue world model. Further analysis shows that this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Dream to Chat: Model-based Reinforcement Learning on Dialogues with User Belief Modeling· underline