Beyond Token-level Supervision: Unlocking the Potential of Decoding-based Regression via Reinforcement Learning
Ming Chen, Sheng Tang, Rong-Xi Tan, Ziniu Li, Jiacheng Chen, Ke Xue, Chao Qian

TL;DR
This paper introduces a reinforcement learning approach to decoding-based regression, improving the alignment between sequence generation and continuous numerical prediction, leading to better accuracy and efficiency.
Contribution
It proposes a novel RL framework for decoding-based regression, utilizing sequence-level rewards to enhance numerical coherence and outperform existing token-level methods.
Findings
RL improves sampling efficiency and predictive accuracy.
The method outperforms state-of-the-art token-level baselines.
Decoding-based regression becomes more robust and precise.
Abstract
Decoding-based regression, which reformulates regression as a sequence generation task, has emerged as a promising paradigm of applying large language models for numerical prediction. However, its progress is hindered by the misalignment between discrete token-level objectives (e.g., cross-entropy) and continuous numerical values. Existing approaches relying on token-level constraints often fail to capture the global magnitude of the target value, limiting their precision and generalization. In this paper, we propose to unlock the potential of decoding-based regression via Reinforcement Learning (RL). We formulate the generation process as a Markov Decision Process, utilizing sequence-level rewards to enforce global numerical coherence. Extensive experiments on tabular regression and code metric regression demonstrate that our method (specifically with ReMax and GRPO) consistently…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Machine Learning and Data Classification · Generative Adversarial Networks and Image Synthesis
