Improving Rationality in the Reasoning Process of Language Models through Self-playing Game

Pinzheng Wang; Juntao Li; Zecheng Tang; Haijia Gui; Min zhang

arXiv:2506.22920·cs.AI·July 8, 2025

Improving Rationality in the Reasoning Process of Language Models through Self-playing Game

Pinzheng Wang, Juntao Li, Zecheng Tang, Haijia Gui, Min zhang

PDF

Open Access 1 Video

TL;DR

This paper introduces a self-play framework called Critic-Discernment Game to improve the reasoning rationality of large language models without human supervision, enhancing their comprehension and self-correction abilities.

Contribution

The paper proposes a novel self-play training method for LLMs that improves reasoning rationality through a game involving critique and self-correction, without external supervision.

Findings

01

Enhanced reasoning accuracy in mathematical tasks

02

Improved error detection and self-correction

03

Better long-chain reasoning capabilities

Abstract

Large language models (LLMs) have demonstrated considerable reasoning abilities in various tasks such as mathematics and coding. However, recent studies indicate that even the best models lack true comprehension of their reasoning processes. In this paper, we explore how self-play can enhance the rationality of models in the reasoning process without supervision from humans or superior models. We design a Critic-Discernment Game(CDG) in which a prover first provides a solution to a given problem and is subsequently challenged by critiques of its solution. These critiques either aim to assist or mislead the prover. The objective of the prover is to maintain the correct answer when faced with misleading comments, while correcting errors in response to constructive feedback. Our experiments on tasks involving mathematical reasoning, stepwise error detection, self-correction, and long-chain…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Improving Rationality in the Reasoning Process of Language Models through Self-playing Game· slideslive

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI)