Decoding-based Regression
Xingyou Song, Dara Bahri

TL;DR
This paper provides theoretical justification for language models' ability to perform regression through decoding, showing that decoder-based heads are as effective as traditional methods and versatile for various numeric tasks.
Contribution
It introduces a theoretical framework for decoding-based regression and demonstrates the effectiveness of causal sequence decoding models as regression heads.
Findings
Decoder-based heads perform comparably to standard pointwise heads on regression tasks.
Decoder models can effectively capture smooth numeric distributions like density estimation.
Theoretical analysis supports the use of decoding for regression in language models.
Abstract
Language models have recently been shown capable of performing regression wherein numeric predictions are represented as decoded strings. In this work, we provide theoretical grounds for this capability and furthermore investigate the utility of causal sequence decoding models as numeric regression heads given any feature representation. We find that, despite being trained in the usual way - for next-token prediction via cross-entropy loss - decoder-based heads are as performant as standard pointwise heads when benchmarked over standard regression tasks, while being flexible enough to capture smooth numeric distributions, such as in the task of density estimation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
