ALIVE: Awakening LLM Reasoning via Adversarial Learning and Instructive Verbal Evaluation
Yiwen Duan, Jing Ye, Xinpei Zhao

TL;DR
ALIVE introduces a novel framework that enhances large language models' reasoning abilities by internalizing evaluative criteria through adversarial learning and verbal feedback, reducing reliance on external rewards.
Contribution
It presents a unified, self-contained approach to reasoning alignment that moves beyond scalar rewards, enabling models to internalize correctness logic from raw data.
Findings
Improves reasoning accuracy across multiple benchmarks.
Enhances cross-domain generalization of LLMs.
Increases self-correction capabilities.
Abstract
The quest for expert-level reasoning in Large Language Models (LLMs) has been hampered by a persistent \textit{reward bottleneck}: traditional reinforcement learning (RL) relies on scalar rewards that are \textbf{costly} to scale, \textbf{brittle} across domains, and \textbf{blind} to the underlying logic of a solution. This reliance on external, impoverished signals prevents models from developing a deep, self-contained understanding of reasoning principles. We introduce \textbf{ALIVE} (\emph{Adversarial Learning with Instructive Verbal Evaluation}), a hands-free alignment framework that moves beyond scalar reward optimization toward intrinsic reasoning acquisition. Grounded in the principle of \emph{Cognitive Synergy}, ALIVE unifies problem posing, solving, and judging within a single policy model to internalize the logic of correctness. By coupling adversarial learning with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Artificial Intelligence in Healthcare and Education
