Emergent Modularity in Pre-trained Transformers
Zhengyan Zhang, Zhiyuan Zeng, Yankai Lin, Chaojun Xiao, Xiaozhi Wang,, Xu Han, Zhiyuan Liu, Ruobing Xie, Maosong Sun, Jie Zhou

TL;DR
This paper investigates the emergence of modularity in pre-trained Transformers, demonstrating that neurons form functionally specialized experts early in training, which impacts their functions and structure.
Contribution
It provides empirical evidence of functional modularity in Transformers and analyzes how modular structures develop during pre-training.
Findings
Functional experts are present in Transformers.
Perturbing experts affects corresponding functions.
Modularity stabilizes early in training.
Abstract
This work examines the presence of modularity in pre-trained Transformers, a feature commonly found in human brains and thought to be vital for general intelligence. In analogy to human brains, we consider two main characteristics of modularity: (1) functional specialization of neurons: we evaluate whether each neuron is mainly specialized in a certain function, and find that the answer is yes. (2) function-based neuron grouping: we explore finding a structure that groups neurons into modules by function, and each module works for its corresponding function. Given the enormous amount of possible structures, we focus on Mixture-of-Experts as a promising candidate, which partitions neurons into experts and usually activates different experts for different inputs. Experimental results show that there are functional experts, where clustered are the neurons specialized in a certain function.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCell Image Analysis Techniques · Neural Networks and Applications · Machine Learning in Materials Science
MethodsFocus
