Clusterability in Neural Networks
Daniel Filan, Stephen Casper, Shlomi Hod, Cody Wild, Andrew Critch,, Stuart Russell

TL;DR
This paper investigates the concept of clusterability in neural networks, showing trained networks are more clusterable than random ones, and introduces methods to enhance clusterability with minimal impact on accuracy, aiming to improve interpretability.
Contribution
The paper introduces novel methods to promote clusterability in neural networks and demonstrates their effectiveness with minimal accuracy loss.
Findings
Trained networks are more clusterable than random networks.
Methods to increase clusterability are effective with little accuracy reduction.
Clusterability can aid interpretability of neural network internals.
Abstract
The learned weights of a neural network have often been considered devoid of scrutable internal structure. In this paper, however, we look for structure in the form of clusterability: how well a network can be divided into groups of neurons with strong internal connectivity but weak external connectivity. We find that a trained neural network is typically more clusterable than randomly initialized networks, and often clusterable relative to random networks with the same distribution of weights. We also exhibit novel methods to promote clusterability in neural network training, and find that in multi-layer perceptrons they lead to more clusterable networks with little reduction in accuracy. Understanding and controlling the clusterability of neural networks will hopefully render their inner workings more interpretable to engineers by facilitating partitioning into meaningful clusters.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Adversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning
