AdapThink: Adaptive Thinking Preferences for Reasoning Language Model
Xu Wan, Wei Wang, Wenyue Xu, Wotao Yin, Jie Song, Mingyang Sun

TL;DR
AdapThink introduces an adaptive post-training framework for reasoning language models that dynamically balances reasoning depth and efficiency, improving performance on mathematical reasoning tasks.
Contribution
It proposes a novel adaptive mechanism combining confidence-based rewards and diversity-aware sampling to enhance reasoning efficiency and effectiveness.
Findings
Improves reasoning efficiency by reducing unnecessary computation.
Maintains or enhances accuracy on mathematical reasoning datasets.
Enables dynamic adjustment of reasoning depth based on question complexity.
Abstract
Reinforcement Learning (RL)-based post-training has significantly advanced the complex reasoning capabilities of language models, fostering sophisticated self-reflection processes. However, this ``slow thinking'' paradigm presents a critical challenge to reasoning efficiency: models may expend excessive computation on simple questions and shift reasoning prematurely for complex ones. Previous mechanisms typically rely on static length budgets or predefined rules, lacking the adaptability for varying question complexities and models' evolving capabilities. To this end, we propose AdapThink, an adaptive post-training framework designed to induce more efficient thinking while maintaining the performance of reasoning language models. Specifically, AdapThink incorporates two key mechanisms: 1) A group-relative reward function that leverages model confidence and response's characteristic to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
