Thinker: Learning to Think Fast and Slow
Stephen Chung, Wenyu Du, Jie Fu

TL;DR
This paper introduces Thinker, a four-stage reasoning framework inspired by psychology, which improves LLM reasoning accuracy and efficiency by separating fast, intuitive responses from slow, deliberative verification and refinement.
Contribution
The paper proposes a novel four-stage QA task inspired by Dual Process Theory, enhancing LLM reasoning accuracy and inference efficiency, and demonstrates its effectiveness on multiple models.
Findings
Accuracy improved from 25.6% to 27.3% for Qwen2.5-1.5B.
Fast Thinking mode achieves 25.2% accuracy with fewer than 1000 tokens.
Open-sourced models and code for reproducibility.
Abstract
Recent studies show that the reasoning capabilities of Large Language Models (LLMs) can be improved by applying Reinforcement Learning (RL) to question-answering (QA) tasks in areas such as math and coding. With a long context length, LLMs may learn to perform search, as indicated by the self-correction behavior observed in DeepSeek R1. However, this search behavior is often imprecise and lacks confidence, resulting in long, redundant responses and highlighting deficiencies in intuition and verification. Inspired by the Dual Process Theory in psychology, we introduce a simple modification to the QA task that includes four stages: Fast Thinking, where the LLM must answer within a strict token budget; Verification, where the model evaluates its initial response; Slow Thinking, where it refines the initial response with more deliberation; and Summarization, where it distills the refinement…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Text Readability and Simplification
