Teaching LLMs to Refine with Tools

Dian Yu; Yuheng Zhang; Jiahao Xu; Tian Liang; Linfeng Song; Zhaopeng; Tu; Haitao Mi; Dong Yu

arXiv:2412.16871·cs.CL·December 24, 2024

Teaching LLMs to Refine with Tools

Dian Yu, Yuheng Zhang, Jiahao Xu, Tian Liang, Linfeng Song, Zhaopeng, Tu, Haitao Mi, Dong Yu

PDF

Open Access

TL;DR

This paper introduces CaP, a novel method that uses external tools and preference optimization to improve the reasoning capabilities of large language models through iterative refinement, surpassing previous methods limited to within-format improvements.

Contribution

CaP is the first approach to combine external tool use with preference optimization for cross-reasoning refinement in LLMs, enhancing their self-improvement capabilities.

Findings

01

CaP effectively improves cross-reasoning refinement in LLMs.

02

Preference optimization is crucial for successful refinement.

03

Sampling strategies influence inference efficiency and quality.

Abstract

Large language models (LLMs) can refine their responses based on feedback, enabling self-improvement through iterative training or test-time refinement. However, existing methods predominantly focus on refinement within the same reasoning format, which may lead to non-correcting behaviors. We propose CaP, a novel approach that uses external tools to refine chain-of-thought (CoT) responses generated by the same or other LLMs. CaP employs a two-stage training process: supervised fine-tuning followed by preference optimization with DPO variants. Our observations highlight the critical role of preference optimization in enabling effective refinement. Additionally, we compare several sampling strategies to leverage CoT and tools at inference time. Experimental results demonstrate CaP's potential for effective cross-reasoning refinement and efficient inference.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Law · Natural Language Processing Techniques · Library Science and Information Systems

MethodsDirect Preference Optimization · Focus