TL;DR
This paper investigates the role of individual neurons in pre-trained language models, revealing how linguistic information is distributed across neurons and how different architectures differ in this aspect.
Contribution
It provides a detailed neuron-level analysis of linguistic knowledge in pre-trained models, highlighting differences across architectures and levels of linguistic complexity.
Findings
Lower-level tasks like morphology are localized in fewer neurons.
Higher-level tasks such as syntax involve more distributed neurons.
XLNet neurons are more localized and disjoint compared to BERT.
Abstract
While a lot of analysis has been carried to demonstrate linguistic knowledge captured by the representations learned within deep NLP models, very little attention has been paid towards individual neurons.We carry outa neuron-level analysis using core linguistic tasks of predicting morphology, syntax and semantics, on pre-trained language models, with questions like: i) do individual neurons in pre-trained models capture linguistic information? ii) which parts of the network learn more about certain linguistic phenomena? iii) how distributed or focused is the information? and iv) how do various architectures differ in learning these properties? We found small subsets of neurons to predict linguistic tasks, with lower level tasks (such as morphology) localized in fewer neurons, compared to higher level task of predicting syntax. Our study also reveals interesting cross architectural…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLinear Layer · Dense Connections · Layer Normalization · Byte Pair Encoding · WordPiece · Multi-Head Attention · Dropout · Linear Warmup With Linear Decay · SentencePiece · Attention Dropout
