Likelihood-Based Reward Designs for General LLM Reasoning
Ariel Kwiatkowski, Natasha Butt, Ismail Labiad, Julia Kempe, Yann Ollivier

TL;DR
This paper investigates likelihood-based reward functions for fine-tuning large language models on reasoning tasks, demonstrating their advantages over binary rewards in various settings and establishing them as a robust approach for chain-of-thought learning.
Contribution
It systematically compares likelihood-based rewards with standard methods, showing their effectiveness and consistency across verifiable and non-verifiable reasoning benchmarks.
Findings
Log-probability rewards perform well in all setups.
They yield better perplexity and success rates in verifiable settings.
Probability-based methods like VeriFree struggle in non-verifiable settings.
Abstract
Fine-tuning large language models (LLMs) on reasoning benchmarks via reinforcement learning requires a specific reward function, often binary, for each benchmark. This comes with two potential limitations: the need to design the reward, and the potentially sparse nature of binary rewards. Here, we systematically investigate rewards derived from the probability or log-probability of emitting the reference answer (or any other prompt continuation present in the data), which have the advantage of not relying on specific verifiers and being available at scale. Several recent works have advocated for the use of similar rewards (e.g., VeriFree, JEPO, RLPR, NOVER). We systematically compare variants of likelihood-based rewards with standard baselines, testing performance both on standard mathematical reasoning benchmarks, and on long-form answers where no external verifier is available. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques
