Distribution-Aware Reward: Reinforcement Learning over Predictive Distributions for LLM Regression
Jungsoo Park, Hyungjoo Chae, Ethan Mendes, Jay DeYoung, Varsha Kishore, Wei Xu, Alan Ritter

TL;DR
This paper introduces Distribution-Aware Reward, a reinforcement learning method that trains large language models to produce well-calibrated predictive distributions for regression tasks, enhancing uncertainty estimation and ranking.
Contribution
It proposes a novel RL objective that optimizes predictive distributions directly, improving calibration and robustness over traditional pointwise methods.
Findings
Improves rank-correlation metrics, including a 6-point Spearman gain on KBSS.
Achieves competitive molecular property prediction using only SMILES strings.
Mitigates rollout diversity collapse and enhances uncertainty diagnostics.
Abstract
Large language models can predict real-valued quantities from heterogeneous inputs such as text, code, and molecular strings, but most training objectives score each decoded floating-point number independently, improving point estimates without ensuring calibrated predictive distributions. This limits applications requiring candidate ranking or uncertainty estimation. We introduce Distribution-Aware Reward, an on-policy reinforcement learning objective whose main contribution is to train language models to produce better predictive distributions for regression tasks, rather than only optimizing individual decoded outputs against scalar targets. Our method treats multiple decoded samples as an empirical predictive distribution, evaluates it with the Continuous Ranked Probability Score, and assigns leave-one-out credit based on each rollout's marginal contribution to distribution quality,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
