Temporal Task Diversity: Inductive Biases Under Non-Stationarity in Synthetic Sequence Modelling

Afiq Abdillah Effiezal Aswadi; Oliver Britton; Ross Baker; Matthew Farrugia-Roberts

arXiv:2605.18281·cs.LG·May 19, 2026

Temporal Task Diversity: Inductive Biases Under Non-Stationarity in Synthetic Sequence Modelling

Afiq Abdillah Effiezal Aswadi, Oliver Britton, Ross Baker, Matthew Farrugia-Roberts

PDF

TL;DR

This paper investigates how non-stationary data distributions during training influence the inductive biases of deep learning models, especially in sequence modeling, revealing that temporal task diversity promotes generalisation over memorisation.

Contribution

It introduces the concept of temporal task diversity and demonstrates its impact on inductive biases in synthetic sequence modeling, highlighting improved generalisation.

Findings

01

Temporal diversity increases bias towards generalisation.

02

Diverse training tasks lead to models that memorize less.

03

Transformers exhibit different generalisation patterns based on task diversity.

Abstract

Modern deep learning science often assumes that neural networks learn from a fixed data distribution. However, many practically important learning problems involve data distributions that change throughout training. How does such non-stationarity impact the inductive biases of deep learning towards models with different structural, generalisation, and safety properties? A fruitful testbed for studying inductive bias is in-context linear regression sequence modelling, where small transformers display strikingly different generalisation patterns depending on the diversity of the (fixed) training task distribution. In this paper, we explore the effect of diversifying the task distribution across training time, finding that such temporal diversity leads to an increased bias towards generalisation over memorisation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.