Advancing Process Verification for Large Language Models via Tree-Based Preference Learning
Mingqian He, Yongliang Shen, Wenqi Zhang, Zeqi Tan, Weiming Lu

TL;DR
This paper introduces Tree-PLV, a novel preference learning verifier that constructs reasoning trees to more effectively evaluate and improve large language models' reasoning accuracy, significantly outperforming existing methods.
Contribution
The paper proposes Tree-PLV, a new tree-based preference learning approach that captures nuanced reasoning step preferences, enhancing LLM verification and reasoning performance.
Findings
Tree-PLV outperforms baseline methods on multiple reasoning benchmarks.
Step-level preference learning improves evaluation accuracy.
Significant performance gains on GSM8K, MATH, CSQA, and StrategyQA.
Abstract
Large Language Models (LLMs) have demonstrated remarkable potential in handling complex reasoning tasks by generating step-by-step rationales.Some methods have proven effective in boosting accuracy by introducing extra verifiers to assess these paths. However, existing verifiers, typically trained on binary-labeled reasoning paths, fail to fully utilize the relative merits of intermediate steps, thereby limiting the effectiveness of the feedback provided. To overcome this limitation, we propose Tree-based Preference Learning Verifier (Tree-PLV), a novel approach that constructs reasoning trees via a best-first search algorithm and collects step-level paired data for preference training. Compared to traditional binary classification, step-level preferences more finely capture the nuances between reasoning steps, allowing for a more precise evaluation of the complete reasoning path. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsSemantic Web and Ontologies · Business Process Modeling and Analysis · Natural Language Processing Techniques
