Modularity in Transformers: Investigating Neuron Separability &   Specialization

Nicholas Pochinkov; Thomas Jones; Mohammed Rashidur Rahman

arXiv:2408.17324·cs.LG·September 2, 2024

Modularity in Transformers: Investigating Neuron Separability & Specialization

Nicholas Pochinkov, Thomas Jones, Mohammed Rashidur Rahman

PDF

Open Access

TL;DR

This paper explores the internal neuron structure of transformer models, revealing task-specific clusters and inherent modularity that can inform interpretability and efficiency improvements.

Contribution

It introduces a novel analysis combining pruning and MoEfication clustering to uncover neuron specialization and overlap across tasks in transformer models.

Findings

01

Neuron clusters are task-specific with some overlap.

02

Neuron importance patterns persist even in random models.

03

MoEfication clusters align with task-specific neurons in different layers.

Abstract

Transformer models are increasingly prevalent in various applications, yet our understanding of their internal workings remains limited. This paper investigates the modularity and task specialization of neurons within transformer architectures, focusing on both vision (ViT) and language (Mistral 7B) models. Using a combination of selective pruning and MoEfication clustering techniques, we analyze the overlap and specialization of neurons across different tasks and data subsets. Our findings reveal evidence of task-specific neuron clusters, with varying degrees of overlap between related tasks. We observe that neuron importance patterns persist to some extent even in randomly initialized models, suggesting an inherent structure that training refines. Additionally, we find that neuron clusters identified through MoEfication correspond more strongly to task-specific neurons in earlier and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications

MethodsPruning