Decoding-based Regression

Xingyou Song; Dara Bahri

arXiv:2501.19383·cs.LG·August 13, 2025

Decoding-based Regression

Xingyou Song, Dara Bahri

PDF

Open Access 1 Repo

TL;DR

This paper provides theoretical justification for language models' ability to perform regression through decoding, showing that decoder-based heads are as effective as traditional methods and versatile for various numeric tasks.

Contribution

It introduces a theoretical framework for decoding-based regression and demonstrates the effectiveness of causal sequence decoding models as regression heads.

Findings

01

Decoder-based heads perform comparably to standard pointwise heads on regression tasks.

02

Decoder models can effectively capture smooth numeric distributions like density estimation.

03

Theoretical analysis supports the use of decoding for regression in language models.

Abstract

Language models have recently been shown capable of performing regression wherein numeric predictions are represented as decoded strings. In this work, we provide theoretical grounds for this capability and furthermore investigate the utility of causal sequence decoding models as numeric regression heads given any feature representation. We find that, despite being trained in the usual way - for next-token prediction via cross-entropy loss - decoder-based heads are as performant as standard pointwise heads when benchmarked over standard regression tasks, while being flexible enough to capture smooth numeric distributions, such as in the task of density estimation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

google-research/optformer
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications