Distribution-Aware Reward: Reinforcement Learning over Predictive Distributions for LLM Regression

Jungsoo Park; Hyungjoo Chae; Ethan Mendes; Jay DeYoung; Varsha Kishore; Wei Xu; Alan Ritter

arXiv:2605.20740·cs.LG·May 21, 2026

Distribution-Aware Reward: Reinforcement Learning over Predictive Distributions for LLM Regression

Jungsoo Park, Hyungjoo Chae, Ethan Mendes, Jay DeYoung, Varsha Kishore, Wei Xu, Alan Ritter

PDF

TL;DR

This paper introduces Distribution-Aware Reward, a reinforcement learning method that trains large language models to produce well-calibrated predictive distributions for regression tasks, enhancing uncertainty estimation and ranking.

Contribution

It proposes a novel RL objective that optimizes predictive distributions directly, improving calibration and robustness over traditional pointwise methods.

Findings

01

Improves rank-correlation metrics, including a 6-point Spearman gain on KBSS.

02

Achieves competitive molecular property prediction using only SMILES strings.

03

Mitigates rollout diversity collapse and enhances uncertainty diagnostics.

Abstract

Large language models can predict real-valued quantities from heterogeneous inputs such as text, code, and molecular strings, but most training objectives score each decoded floating-point number independently, improving point estimates without ensuring calibrated predictive distributions. This limits applications requiring candidate ranking or uncertainty estimation. We introduce Distribution-Aware Reward, an on-policy reinforcement learning objective whose main contribution is to train language models to produce better predictive distributions for regression tasks, rather than only optimizing individual decoded outputs against scalar targets. Our method treats multiple decoded samples as an empirical predictive distribution, evaluates it with the Continuous Ranked Probability Score, and assigns leave-one-out credit based on each rollout's marginal contribution to distribution quality,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.