Likelihood-Based Reward Designs for General LLM Reasoning

Ariel Kwiatkowski; Natasha Butt; Ismail Labiad; Julia Kempe; Yann Ollivier

arXiv:2602.03979·cs.CL·February 5, 2026

Likelihood-Based Reward Designs for General LLM Reasoning

Ariel Kwiatkowski, Natasha Butt, Ismail Labiad, Julia Kempe, Yann Ollivier

PDF

Open Access

TL;DR

This paper investigates likelihood-based reward functions for fine-tuning large language models on reasoning tasks, demonstrating their advantages over binary rewards in various settings and establishing them as a robust approach for chain-of-thought learning.

Contribution

It systematically compares likelihood-based rewards with standard methods, showing their effectiveness and consistency across verifiable and non-verifiable reasoning benchmarks.

Findings

01

Log-probability rewards perform well in all setups.

02

They yield better perplexity and success rates in verifiable settings.

03

Probability-based methods like VeriFree struggle in non-verifiable settings.

Abstract

Fine-tuning large language models (LLMs) on reasoning benchmarks via reinforcement learning requires a specific reward function, often binary, for each benchmark. This comes with two potential limitations: the need to design the reward, and the potentially sparse nature of binary rewards. Here, we systematically investigate rewards derived from the probability or log-probability of emitting the reference answer (or any other prompt continuation present in the data), which have the advantage of not relying on specific verifiers and being available at scale. Several recent works have advocated for the use of similar rewards (e.g., VeriFree, JEPO, RLPR, NOVER). We systematically compare variants of likelihood-based rewards with standard baselines, testing performance both on standard mathematical reasoning benchmarks, and on long-form answers where no external verifier is available. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques