What shapes feature representations? Exploring datasets, architectures, and training
Katherine L. Hermann, Andrew K. Lampinen

TL;DR
This study investigates how datasets, architectures, and training influence feature representations in models, revealing preferences for certain features and the impact of feature difficulty on learned representations.
Contribution
It introduces a controlled synthetic dataset approach to analyze feature representation preferences and the effects of feature complexity and task relevance.
Findings
Models prefer linearly decodable features from untrained models.
Training enhances task-relevant features and suppresses irrelevant ones.
Easy features lead to more consistent and similar representations across models.
Abstract
In naturalistic learning problems, a model's input contains a wide range of features, some useful for the task at hand, and others not. Of the useful features, which ones does the model use? Of the task-irrelevant features, which ones does the model represent? Answers to these questions are important for understanding the basis of models' decisions, as well as for building models that learn versatile, adaptable representations useful beyond the original training task. We study these questions using synthetic datasets in which the task-relevance of input features can be controlled directly. We find that when two features redundantly predict the labels, the model preferentially represents one, and its preference reflects what was most linearly decodable from the untrained model. Over training, task-relevant features are enhanced, and task-irrelevant features are partially suppressed.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications
