The ART of LLM Refinement: Ask, Refine, and Trust
Kumar Shridhar, Koustuv Sinha, Andrew Cohen, Tianlu Wang, Ping Yu, Ram, Pasunuru, Mrinmaya Sachan, Jason Weston, Asli Celikyilmaz

TL;DR
This paper introduces ART, a reasoning-based refinement method for LLMs that improves their self-evaluation and correction abilities, especially in multistep reasoning tasks, with better performance and efficiency.
Contribution
The paper proposes the ART framework, which enhances LLM self-refinement by incorporating reasoning to decide when to refine and trust outputs, outperforming existing methods.
Findings
ART achieves +5 performance points over self-refinement baselines.
Using smaller models for decision-making reduces costs while maintaining effectiveness.
ART improves multistep reasoning in mathematical and question-answering tasks.
Abstract
In recent years, Large Language Models (LLMs) have demonstrated remarkable generative abilities, but can they judge the quality of their own generations? A popular concept, referred to as self-refinement, postulates that LLMs can detect and correct the errors in their generations when asked to do so. However, recent empirical evidence points in the opposite direction, suggesting that LLMs often struggle to accurately identify errors when reasoning is involved. To address this, we propose a reasoning with refinement objective called ART: Ask, Refine, and Trust, which asks necessary questions to decide when an LLM should refine its output, and either affirm or withhold trust in its refinement by ranking the refinement and the initial prediction. On two multistep reasoning tasks of mathematical word problems (GSM8K) and question answering (StrategyQA), ART achieves a performance gain of +5…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Data Quality and Management
