Discovering Knowledge-Critical Subnetworks in Pretrained Language Models

Deniz Bayazit; Negar Foroutan; Zeming Chen; Gail Weiss; Antoine; Bosselut

arXiv:2310.03084·cs.CL·October 16, 2024

Discovering Knowledge-Critical Subnetworks in Pretrained Language Models

Deniz Bayazit, Negar Foroutan, Zeming Chen, Gail Weiss, Antoine, Bosselut

PDF

Open Access 1 Repo

TL;DR

This paper introduces a method to identify and remove knowledge-critical subnetworks in pretrained language models, enabling precise knowledge suppression while preserving overall model performance.

Contribution

It proposes a multi-objective differentiable masking scheme to discover sparse subnetworks responsible for specific knowledge in language models.

Findings

01

Highly sparse subnetworks (98%+ sparsity) are critical for specific knowledge.

02

Removing these subnetworks suppresses targeted knowledge with minimal impact on other abilities.

03

The method works effectively on multiple GPT2 variants.

Abstract

Pretrained language models (LMs) encode implicit representations of knowledge in their parameters. However, localizing these representations and disentangling them from each other remains an open problem. In this work, we investigate whether pretrained language models contain various knowledge-critical subnetworks: particular sparse computational subgraphs that can, if removed, precisely suppress specific knowledge the model has memorized. We propose a multi-objective differentiable masking scheme that can be applied to both weights and neurons to discover such subnetworks and show that we can use them to precisely remove specific knowledge from models while minimizing adverse effects on the behavior of the original model. We demonstrate our method on multiple GPT2 variants, uncovering highly sparse subnetworks (98%+ sparsity) that are critical for expressing specific collections of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bayazitdeniz/know-subnet
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning