On the Origins of the Block Structure Phenomenon in Neural Network Representations
Thao Nguyen, Maithra Raghu, Simon Kornblith

TL;DR
This paper investigates the block structure phenomenon in neural networks, revealing it originates from dominant dataset features and varies across models, with implications for understanding neural representations and training effects.
Contribution
It uncovers the origin of block structures from dominant dataset features and analyzes their evolution and dependence on training methods and randomness.
Findings
Block structure arises from dominant dataset features like image statistics.
The dominant datapoints and shared features vary across random seeds.
Interventions can eliminate the block structure, affecting training dynamics.
Abstract
Recent work has uncovered a striking phenomenon in large-capacity neural networks: they contain blocks of contiguous hidden layers with highly similar representations. This block structure has two seemingly contradictory properties: on the one hand, its constituent layers exhibit highly similar dominant first principal components (PCs), but on the other hand, their representations, and their common first PC, are highly dissimilar across different random seeds. Our work seeks to reconcile these discrepant properties by investigating the origin of the block structure in relation to the data and training methods. By analyzing properties of the dominant PCs, we find that the block structure arises from dominant datapoints - a small group of examples that share similar image statistics (e.g. background color). However, the set of dominant datapoints, and the precise shared image statistic,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Generative Adversarial Networks and Image Synthesis · Model Reduction and Neural Networks
