Studying Cross-cluster Modularity in Neural Networks
Satvik Golechha, Maheep Chaudhary, Joan Velja, Alessandro Abate, Nandi Schoots

TL;DR
This paper introduces a measure for clusterability in neural networks, demonstrates that pre-trained models are highly interconnected, and proposes a training method to enhance modularity, revealing properties of these modular models across various architectures and datasets.
Contribution
It defines a new clusterability measure, develops a clusterability loss to promote modularity, and analyzes the properties of resulting clustered models across multiple neural network types.
Findings
Pre-trained models form highly enmeshed clusters.
Training with clusterability loss produces more modular models.
Clustered models form smaller, more specialized circuits.
Abstract
An approach to improve neural network interpretability is via clusterability, i.e., splitting a model into disjoint clusters that can be studied independently. We define a measure for clusterability and show that pre-trained models form highly enmeshed clusters via spectral graph clustering. We thus train models to be more modular using a "clusterability loss" function that encourages the formation of non-interacting clusters. We then investigate the emerging properties of these highly clustered models. We find our trained clustered models do not exhibit more task specialization, but do form smaller circuits. We investigate CNNs trained on MNIST and CIFAR, small transformers trained on modular addition, and GPT-2 and Pythia on the Wiki dataset, and Gemma on a Chemistry dataset. This investigation shows what to expect from clustered models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare
