On the Minimal Degree Bias in Generalization on the Unseen for non-Boolean Functions
Denys Pushkin, Rapha\"el Berthier, Emmanuel Abbe

TL;DR
This paper analyzes how random feature models and Transformers generalize to unseen data, revealing that in certain regimes they tend to learn minimal degree interpolators, with special behavior observed in Boolean versus non-Boolean settings.
Contribution
It proves minimal degree interpolation in the GOTU setting for RF models in the small feature regime and explores how data embedding affects the learned interpolator's degree.
Findings
RF models converge to minimal degree interpolators in the GOTU setting.
Embedding data as roots of unities leads to minimal degree interpolation.
Non-Boolean data may result in non-minimal degree interpolators.
Abstract
We investigate the out-of-domain generalization of random feature (RF) models and Transformers. We first prove that in the `generalization on the unseen (GOTU)' setting, where training data is fully seen in some part of the domain but testing is made on another part, and for RF models in the small feature regime, the convergence takes place to interpolators of minimal degree as in the Boolean case (Abbe et al., 2023). We then consider the sparse target regime and explain how this regime relates to the small feature regime, but with a different regularization term that can alter the picture in the non-Boolean case. We show two different outcomes for the sparse regime with q-ary data tokens: (1) if the data is embedded with roots of unities, then a min-degree interpolator is learned like in the Boolean case for RF models, (2) if the data is not embedded as such, e.g., simply as integers,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Geophysical Methods and Applications · Machine Learning and ELM
