TreeBoN: Enhancing Inference-Time Alignment with Speculative Tree-Search and Best-of-N Sampling

Jiahao Qiu; Yifu Lu; Yifan Zeng; Jiacheng Guo; Jiayi Geng; Chenhao Zhu; Xinzhe Juan; Ling Yang; Huazheng Wang; Kaixuan Huang; Yue Wu; Mengdi Wang

arXiv:2410.16033·cs.CL·September 4, 2025

TreeBoN: Enhancing Inference-Time Alignment with Speculative Tree-Search and Best-of-N Sampling

Jiahao Qiu, Yifu Lu, Yifan Zeng, Jiacheng Guo, Jiayi Geng, Chenhao Zhu, Xinzhe Juan, Ling Yang, Huazheng Wang, Kaixuan Huang, Yue Wu, Mengdi Wang

PDF

Open Access

TL;DR

TreeBoN introduces a speculative tree-search method integrated with Best-of-N sampling to improve large language model inference efficiency and output quality without additional training.

Contribution

It presents a novel framework combining speculative tree-search with token-level rewards to reduce computational costs while maintaining high response quality.

Findings

01

Achieves 65% win rate on TutorEval.

02

Outperforms standard BoN at the same computational cost.

03

Demonstrates scalability and improved alignment across multiple datasets.

Abstract

Inference-time alignment enhances the performance of large language models without requiring additional training or fine-tuning but presents challenges due to balancing computational efficiency with high-quality output. Best-of-N (BoN) sampling, as a simple yet powerful approach, generates multiple responses and selects the best one, achieving improved performance but with a high computational cost. We propose TreeBoN, a novel framework that integrates a speculative tree-search strategy into Best-of-N (BoN) Sampling. TreeBoN maintains a set of parent nodes, iteratively branching and pruning low-quality responses, thereby reducing computational overhead while maintaining high output quality. Our approach also leverages token-level rewards from Direct Preference Optimization (DPO) to guide tree expansion and prune low-quality paths. We evaluate TreeBoN using AlpacaFarm, HH-RLHF,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Time Series Analysis and Forecasting · Machine Learning and Data Classification

MethodsSparse Evolutionary Training · Pruning