Temporal Task Diversity: Inductive Biases Under Non-Stationarity in Synthetic Sequence Modelling
Afiq Abdillah Effiezal Aswadi, Oliver Britton, Ross Baker, Matthew Farrugia-Roberts

TL;DR
This paper investigates how non-stationary data distributions during training influence the inductive biases of deep learning models, especially in sequence modeling, revealing that temporal task diversity promotes generalisation over memorisation.
Contribution
It introduces the concept of temporal task diversity and demonstrates its impact on inductive biases in synthetic sequence modeling, highlighting improved generalisation.
Findings
Temporal diversity increases bias towards generalisation.
Diverse training tasks lead to models that memorize less.
Transformers exhibit different generalisation patterns based on task diversity.
Abstract
Modern deep learning science often assumes that neural networks learn from a fixed data distribution. However, many practically important learning problems involve data distributions that change throughout training. How does such non-stationarity impact the inductive biases of deep learning towards models with different structural, generalisation, and safety properties? A fruitful testbed for studying inductive bias is in-context linear regression sequence modelling, where small transformers display strikingly different generalisation patterns depending on the diversity of the (fixed) training task distribution. In this paper, we explore the effect of diversifying the task distribution across training time, finding that such temporal diversity leads to an increased bias towards generalisation over memorisation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
