TL;DR
LOVER is an unsupervised verifier for LLMs that uses logical rules to improve reasoning without costly labeled data, achieving near-supervised performance.
Contribution
It introduces a logical-rule regularized verifier that leverages unlabeled data and is compatible with any off-the-shelf LLMs.
Findings
LOVER outperforms unsupervised baselines on 10 datasets.
LOVER achieves 95% of supervised verifier performance on average.
The source code is publicly available.
Abstract
Verifiers are crucial components for enhancing modern LLMs' reasoning capability. Typicalverifiers require resource-intensive superviseddataset construction, which is costly and faceslimitations in data diversity. In this paper, wepropose LOVER, an unsupervised verifier regularized by logical rules. LOVER treats theverifier as a binary latent variable, utilizinginternal activations and enforcing three logical constraints on multiple reasoning paths:negation consistency, intra-group consistency,and inter-group consistency (grouped by thefinal answer). By incorporating logical rulesas priors, LOVER can leverage unlabeled examples and is directly compatible with any offthe-shelf LLMs. Experiments on 10 datasetsdemonstrate that LOVER significantly outperforms unsupervised baselines, achieving performance comparable to the supervised verifier(reaching its 95% level on average). The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
