Improving Rationality in the Reasoning Process of Language Models through Self-playing Game
Pinzheng Wang, Juntao Li, Zecheng Tang, Haijia Gui, Min zhang

TL;DR
This paper introduces a self-play framework called Critic-Discernment Game to improve the reasoning rationality of large language models without human supervision, enhancing their comprehension and self-correction abilities.
Contribution
The paper proposes a novel self-play training method for LLMs that improves reasoning rationality through a game involving critique and self-correction, without external supervision.
Findings
Enhanced reasoning accuracy in mathematical tasks
Improved error detection and self-correction
Better long-chain reasoning capabilities
Abstract
Large language models (LLMs) have demonstrated considerable reasoning abilities in various tasks such as mathematics and coding. However, recent studies indicate that even the best models lack true comprehension of their reasoning processes. In this paper, we explore how self-play can enhance the rationality of models in the reasoning process without supervision from humans or superior models. We design a Critic-Discernment Game(CDG) in which a prover first provides a solution to a given problem and is subsequently challenged by critiques of its solution. These critiques either aim to assist or mislead the prover. The objective of the prover is to maintain the correct answer when faced with misleading comments, while correcting errors in response to constructive feedback. Our experiments on tasks involving mathematical reasoning, stepwise error detection, self-correction, and long-chain…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI)
