Improving Multi-turn Dialogue Consistency with Self-Recall Thinking
Renning Pang, Tian Lan, Leyuan Liu, Xiaoming Huang, Piao Tong, and Xiaosong Zhang

TL;DR
This paper introduces Self-Recall Thinking (SRT), a novel framework for enhancing multi-turn dialogue systems by improving long-range dependency tracking and response consistency without external memory modules.
Contribution
SRT enables models to selectively recall relevant historical turns and reason over context, improving dialogue coherence and efficiency compared to existing methods.
Findings
SRT improves F1 score by 4.7% over prior methods.
SRT reduces end-to-end latency by 14.7%.
SRT outperforms state-of-the-art baselines in multi-turn dialogue tasks.
Abstract
Large language model (LLM) based multi-turn dialogue systems often struggle to track dependencies across non-adjacent turns, undermining both consistency and scalability. As conversations lengthen, essential information becomes sparse and is buried in irrelevant context, while processing the entire dialogue history incurs severe efficiency bottlenecks. Existing solutions either rely on high latency external memory or lose fine-grained details through iterative summarization. In this paper, we propose Self-Recall Thinking (SRT), a framework designed to address long-range contextual dependency and sparse informative signals in multi-turn dialogue. SRT identifies helpful historical turns and uses them to generate contextually appropriate responses, enabling the model to selectively recall and reason over context during inference. This process yields an endogenous reasoning process that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
