Out-of-distribution generalisation is hard: evidence from ARC-like tasks
George Dimitriadis, Spyridon Samothrakis

TL;DR
This paper investigates the challenges of out-of-distribution generalisation, demonstrating that even successful models may not learn truly compositional features, and introduces novel architectures with biases to improve OOD performance.
Contribution
The paper highlights the importance of verifying feature compositionality in OOD generalisation and presents two new biased neural network architectures designed to enhance OOD success.
Findings
Common neural networks fail on clearly defined OOD tasks.
New biased architectures achieve better OOD performance.
Successful OOD performance does not guarantee learning of compositional features.
Abstract
Out-of-distribution (OOD) generalisation is considered a hallmark of human and animal intelligence. To achieve OOD through composition, a system must discover the environment-invariant properties of experienced input-output mappings and transfer them to novel inputs. This can be realised if an intelligent system can identify appropriate, task-invariant, and composable input features, as well as the composition methods, thus allowing it to act based not on the interpolation between learnt data points but on the task-invariant composition of those features. We propose that in order to confirm that an algorithm does indeed learn compositional structures from data, it is not enough to just test on an OOD setup, but one also needs to confirm that the features identified are indeed compositional. We showcase this by exploring two tasks with clearly defined OOD metrics that are not OOD…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Neural Networks and Applications · Face Recognition and Perception
MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Dense Connections · Dropout · Layer Normalization · Byte Pair Encoding · Softmax · Absolute Position Encodings · Residual Connection
