RoBoN: Routed Online Best-of-n for Test-Time Scaling with Multiple LLMs
Jonathan Geuter, Gregor Kornhardt

TL;DR
RoBoN is a sequential, routing-based method that leverages multiple LLMs at inference time to improve response quality over traditional single-model best-of-$n$ approaches, without additional training.
Contribution
RoBoN introduces a novel online routing mechanism for multiple LLMs, enhancing test-time scaling and response accuracy without extra training or compute overhead.
Findings
RoBoN outperforms standard best-of-$n$ in accuracy by up to 3.4%.
RoBoN improves performance across various reasoning benchmarks.
Diversity among models can be exploited at inference to enhance results.
Abstract
Best-of- is a widely used test-time scaling approach for LLM inference. Yet despite evidence that LLMs exhibit complementary strengths across tasks, traditionally best-of- relies on a single model to generate responses. We propose RoBoN (Routed Online Best-of-), a sequential multi-LLM alternative to the prevailing single-model best-of-. Given a suite of models , RoBoN sequentially routes generations one-by-one across models, based on scores computed using a reward model and an agreement signal on the predicted responses. This online routing requires no additional training, keeps compute parity, and works with any plug-in reward model. Across reasoning benchmarks (MATH500, OlympiadBench, MinervaMath, GSM8K, MMLU), RoBoN consistently outperforms standard best-of- applied to each individual model for larger , with gains of up to 3.4\% in absolute…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Topic Modeling · Explainable Artificial Intelligence (XAI)
