FUSE: Ensembling Verifiers with Zero Labeled Data
Joonhyuk Lee, Virginia Ma, Sarah Zhao, Yash Nair, Asher Spector, Regev Cohen, Emmanuel J. Cand\`es

TL;DR
FUSE is a novel unsupervised ensembling method that enhances model output verification without needing labeled data, improving accuracy across various benchmarks.
Contribution
Introduces FUSE, a zero-label ensembling technique that controls verifier dependencies to boost verification performance without ground truth labels.
Findings
FUSE matches or outperforms semi-supervised methods in diverse benchmarks.
Effective across academic and frontier verification tasks.
Improves verification quality without labeled data.
Abstract
Verification of model outputs is rapidly emerging as a key primitive for both training and real-world deployment of large language models (LLMs). In practice, this often involves using imperfect LLM judges and reward models since ground truth acquisition can be time-consuming and expensive. We introduce Fully Unsupervised Score Ensembling (FUSE), a method for improving verification quality by ensembling verifiers without access to ground truth correctness labels. The key idea behind FUSE is to control conditional dependencies between verifiers in a manner that improves the unsupervised performance of a class of spectral algorithms from the ensembling literature. Despite requiring zero ground truth labels, FUSE typically matches or improves upon semi-supervised alternatives in test-time scaling experiments with diverse sets of generator models, verifiers, and benchmarks. In particular,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
