Resolving Lexical Bias in Model Editing
Hammad Rizwan, Domenic Rosati, Ga Wu, Hassan Sajjad

TL;DR
This paper introduces PENME, a novel model editing method that learns a disentangled representation space to improve the precision and efficiency of editing large language models, addressing lexical bias issues.
Contribution
The paper proposes a new approach to model editing that disentangles representations for better localization and robustness, outperforming previous methods in accuracy and efficiency.
Findings
Achieves state-of-the-art editing performance
More computationally efficient during inference
Effective across different model architectures
Abstract
Model editing aims to modify the outputs of large language models after they are trained. Previous approaches have often involved direct alterations to model weights, which can result in model degradation. Recent techniques avoid making modifications to the model's weights by using an adapter that applies edits to the model when triggered by semantic similarity in the representation space. We demonstrate that current adapter methods are critically vulnerable to strong lexical biases, leading to issues such as applying edits to irrelevant prompts with overlapping words. This paper presents a principled approach to learning a disentangled representation space that facilitates precise localization of edits by maintaining distance between irrelevant prompts while preserving proximity among paraphrases. In our empirical study, we show that our method (Projector Editor Networks for Model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Text Readability and Simplification
MethodsAdapter · Balanced Selection · Contrastive Learning
