Data Distributional Properties As Inductive Bias for Systematic Generalization

Felipe del Rio; Alain Raymond-Saez; Daniel Florea; Rodrigo Toro Icarte; Julio Hurtado; Cristian B. Calderon; Alvaro Soto

arXiv:2502.20499·cs.LG·June 19, 2025

Data Distributional Properties As Inductive Bias for Systematic Generalization

Felipe del Rio, Alain Raymond-Saez, Daniel Florea, Rodrigo Toro Icarte, Julio Hurtado, Cristian B. Calderon, Alvaro Soto

PDF

TL;DR

This paper investigates how specific data distributional properties like diversity, burstiness, and latent intervention serve as inductive biases to significantly improve systematic generalization in multi-modal language models, revealing the role of NMI and representation geometry.

Contribution

It introduces and empirically evaluates three data properties as inductive biases for systematic generalization, highlighting the importance of NMI and representation geometry in out-of-distribution performance.

Findings

01

All three data properties significantly enhance systematic generalization.

02

Diversity increases accuracy by 89% in affected properties.

03

Lower NMI correlates with better out-of-distribution generalization.

Abstract

Deep neural networks (DNNs) struggle at systematic generalization (SG). Several studies have evaluated the possibility to promote SG through the proposal of novel architectures, loss functions or training methodologies. Few studies, however, have focused on the role of training data properties in promoting SG. In this work, we investigate the impact of certain data distributional properties, as inductive biases for the SG ability of a multi-modal language model. To this end, we study three different properties. First, data diversity, instantiated as an increase in the possible values a latent property in the training distribution may take. Second, burstiness, where we probabilistically restrict the number of possible values of latent factors on particular inputs during training. Third, latent intervention, where a particular latent factor is altered randomly during training. We find…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.