Thinking LLMs: General Instruction Following with Thought Generation
Tianhao Wu, Janice Lan, Weizhe Yuan, Jiantao Jiao, Jason Weston,, Sainbayar Sukhbaatar

TL;DR
This paper introduces a training method that enables large language models to generate explicit thoughts before answering, improving their performance across reasoning and non-reasoning tasks without additional human data.
Contribution
It proposes an iterative search and optimization approach to teach LLMs explicit thinking abilities for instruction following without supervised data.
Findings
Improved performance on AlpacaEval and Arena-Hard benchmarks.
Gains observed in non-reasoning categories like marketing, health, and general knowledge.
Effective enhancement of LLMs' thinking capabilities without extra human annotations.
Abstract
LLMs are typically trained to answer user questions or follow instructions similarly to how human experts respond. However, in the standard alignment framework they lack the basic ability of explicit thinking before answering. Thinking is important for complex questions that require reasoning and planning -- but can be applied to any task. We propose a training method for equipping existing LLMs with such thinking abilities for general instruction following without use of additional human data. We achieve this by an iterative search and optimization procedure that explores the space of possible thought generations, allowing the model to learn how to think without direct supervision. For each instruction, the thought candidates are scored using a judge model to evaluate their responses only, and then optimized via preference optimization. We show that this procedure leads to superior…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Law · Legal Education and Practice Innovations
