Disentangling Representations of Text by Masking Transformers
Xiongyi Zhang, Jan-Willem van de Meent, Byron C. Wallace

TL;DR
This paper proposes a method to identify and extract disentangled, aspect-specific subnetworks within pretrained transformer models like BERT by learning binary masks, enabling targeted interpretability and improved task performance.
Contribution
It introduces a masking-based approach to discover sparse subnetworks within BERT that encode distinct features, avoiding the need for training separate models for each task.
Findings
Subnetworks strongly encode specific aspects like toxicity or sentiment
Masking combined with pruning identifies sparse, interpretable subnetworks
Disentanglement via masking matches or exceeds prior methods in effectiveness
Abstract
Representations from large pretrained models such as BERT encode a range of features into monolithic vectors, affording strong predictive accuracy across a multitude of downstream tasks. In this paper we explore whether it is possible to learn disentangled representations by identifying existing subnetworks within pretrained models that encode distinct, complementary aspect representations. Concretely, we learn binary masks over transformer weights or hidden units to uncover subsets of features that correlate with a specific factor of variation; this eliminates the need to train a disentangled model from scratch for a particular task. We evaluate this method with respect to its ability to disentangle representations of sentiment from genre in movie reviews, "toxicity" from dialect in Tweets, and syntax from semantics. By combining masking with magnitude pruning we find that we can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Computational and Text Analysis Methods
MethodsMulti-Head Attention · Attention Is All You Need · Pruning · Linear Layer · Dense Connections · Softmax · Refunds@Expedia|||How do I get a full refund from Expedia? · Linear Warmup With Linear Decay · Weight Decay · WordPiece
