Fast Model Editing at Scale

Eric Mitchell; Charles Lin; Antoine Bosselut; Chelsea Finn,; Christopher D. Manning

arXiv:2110.11309·cs.LG·June 15, 2022·1 cites

Fast Model Editing at Scale

Eric Mitchell, Charles Lin, Antoine Bosselut, Chelsea Finn,, Christopher D. Manning

PDF

Open Access 3 Repos 1 Video

TL;DR

This paper introduces MEND, a scalable and efficient method for editing large pre-trained models' behavior using small auxiliary networks, enabling rapid, targeted updates without extensive retraining.

Contribution

MEND is the first approach capable of effectively editing models with over 10 billion parameters using a single GPU in less than a day.

Findings

01

MEND successfully edits models like T5, GPT, BERT, and BART.

02

It outperforms existing methods in effectiveness and efficiency.

03

MEND enables rapid, local model updates at scale.

Abstract

While large pre-trained models have enabled impressive results on a variety of downstream tasks, the largest existing models still make errors, and even accurate predictions may become outdated over time. Because detecting all such failures at training time is impossible, enabling both developers and end users of such models to correct inaccurate outputs while leaving the model otherwise intact is desirable. However, the distributed, black-box nature of the representations learned by large neural networks makes producing such targeted edits difficult. If presented with only a single problematic input and new desired output, fine-tuning approaches tend to overfit; other editing algorithms are either computationally infeasible or simply ineffective when applied to very large models. To enable easy post-hoc editing at scale, we propose Model Editor Networks using Gradient Decomposition…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Fast Model Editing at Scale· slideslive

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Topic Modeling

MethodsMulti-Head Attention · Attention Is All You Need · MODEL EDITOR NETWORKS WITH GRADIENT DECOMPOSITION · Linear Layer · Adafactor · SentencePiece · Gated Linear Unit · Inverse Square Root Schedule · Cosine Annealing · Weight Decay