Fast Model Editing at Scale
Eric Mitchell, Charles Lin, Antoine Bosselut, Chelsea Finn,, Christopher D. Manning

TL;DR
This paper introduces MEND, a scalable and efficient method for editing large pre-trained models' behavior using small auxiliary networks, enabling rapid, targeted updates without extensive retraining.
Contribution
MEND is the first approach capable of effectively editing models with over 10 billion parameters using a single GPU in less than a day.
Findings
MEND successfully edits models like T5, GPT, BERT, and BART.
It outperforms existing methods in effectiveness and efficiency.
MEND enables rapid, local model updates at scale.
Abstract
While large pre-trained models have enabled impressive results on a variety of downstream tasks, the largest existing models still make errors, and even accurate predictions may become outdated over time. Because detecting all such failures at training time is impossible, enabling both developers and end users of such models to correct inaccurate outputs while leaving the model otherwise intact is desirable. However, the distributed, black-box nature of the representations learned by large neural networks makes producing such targeted edits difficult. If presented with only a single problematic input and new desired output, fine-tuning approaches tend to overfit; other editing algorithms are either computationally infeasible or simply ineffective when applied to very large models. To enable easy post-hoc editing at scale, we propose Model Editor Networks using Gradient Decomposition…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Topic Modeling
MethodsMulti-Head Attention · Attention Is All You Need · MODEL EDITOR NETWORKS WITH GRADIENT DECOMPOSITION · Linear Layer · Adafactor · SentencePiece · Gated Linear Unit · Inverse Square Root Schedule · Cosine Annealing · Weight Decay
