Beyond Introspection: Reinforcing Thinking via Externalist Behavioral Feedback
Diji Yang, Linda Zeng, Kezhen Chen, Yi Zhang

TL;DR
This paper introduces an externalist three-step framework called DRR that improves LLM reasoning by evaluating observable behaviors and providing corrective feedback, surpassing traditional self-critique methods.
Contribution
The proposed DRR framework moves beyond introspective self-critique by using external behavioral feedback to enhance reasoning accuracy in LLMs without modifying the base model.
Findings
DRR significantly outperforms self-critique methods on reasoning benchmarks.
It is lightweight and annotation-free, making it scalable.
DRR improves reasoning reliability across various LLMs.
Abstract
While inference-time thinking allows Large Language Models (LLMs) to address complex problems, the extended thinking process can be unreliable or inconsistent because of the model's probabilistic nature, especially near its knowledge boundaries. Existing approaches attempt to mitigate this by having the model critique its own reasoning to make corrections. However, such self-critique inherits the same biases of the original output, known as the introspection illusion. Moving beyond such introspection and inspired by core methodologies in ethology, we propose an externalist three-step framework Distillation-Reinforcement-Reasoning (DRR). Rather than relying on a model's introspection, DRR evaluates its observable behaviors to provide corrective feedback. DRR first distills the reasoner's behavioral traces, then trains a lightweight, external Discriminative Model (DM). At inference time,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplex Systems and Decision Making · Intelligent Tutoring Systems and Adaptive Learning
