You Are What You Eat -- AI Alignment Requires Understanding How Data Shapes Structure and Generalisation
Simon Pepin Lehalleur, Jesse Hoogland, Matthew Farrugia-Roberts, Susan, Wei, Alexander Gietelink Oldenziel, George Wang, Liam Carroll, Daniel Murfet

TL;DR
This paper emphasizes that understanding the relationship between data structure and model internal structure is crucial for AI alignment, highlighting limitations of current evaluation methods and advocating for a statistical foundation to ensure safety and generalisation.
Contribution
It introduces the importance of studying data and model structure relations to advance a scientific understanding of AI alignment beyond standard evaluation.
Findings
Equivalent training performance does not imply similar internal computation.
Current evaluation methods are insufficient for safety assurances.
A need for statistical foundations linking data, model structure, and generalisation.
Abstract
In this position paper, we argue that understanding the relation between structure in the data distribution and structure in trained models is central to AI alignment. First, we discuss how two neural networks can have equivalent performance on the training set but compute their outputs in essentially different ways and thus generalise differently. For this reason, standard testing and evaluation are insufficient for obtaining assurances of safety for widely deployed generally intelligent systems. We argue that to progress beyond evaluation to a robust mathematical science of AI alignment, we need to develop statistical foundations for an understanding of the relation between structure in the data distribution, internal structure in models, and how these structures underlie generalisation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI-based Problem Solving and Planning · Data Mining Algorithms and Applications · Neural Networks and Applications
