PMET: Precise Model Editing in a Transformer

Xiaopeng Li; Shasha Li; Shezheng Song; Jing Yang; Jun Ma; and Jie Yu

arXiv:2308.08742·cs.CL·March 12, 2024·6 cites

PMET: Precise Model Editing in a Transformer

Xiaopeng Li, Shasha Li, Shezheng Song, Jing Yang, Jun Ma, and Jie Yu

PDF

Open Access 1 Repo

TL;DR

PMET introduces a novel approach to model editing in transformers by separately optimizing MHSA and FFN components, leading to more precise updates and state-of-the-art results on knowledge editing benchmarks.

Contribution

The paper presents PMET, a new method that optimizes transformer hidden states for more accurate model editing by leveraging the distinct roles of MHSA and FFN.

Findings

01

MHSA encodes general knowledge extraction patterns.

02

Optimizing only FFN hidden states improves editing precision.

03

PMET achieves state-of-the-art results on COUNTERFACT and zsRE datasets.

Abstract

Model editing techniques modify a minor proportion of knowledge in Large Language Models (LLMs) at a relatively low cost, which have demonstrated notable success. Existing methods assume Transformer Layer (TL) hidden states are values of key-value memories of the Feed-Forward Network (FFN). They usually optimize the TL hidden states to memorize target knowledge and use it to update the weights of the FFN in LLMs. However, the information flow of TL hidden states comes from three parts: Multi-Head Self-Attention (MHSA), FFN, and residual connections. Existing methods neglect the fact that the TL hidden states contains information not specifically required for FFN. Consequently, the performance of model editing decreases. To achieve more precise model editing, we analyze hidden states of MHSA and FFN, finding that MHSA encodes certain general knowledge extraction patterns. This implies…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xpq-tech/pmet
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Advanced Graph Neural Networks · Natural Language Processing Techniques

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Adam · Label Smoothing · Layer Normalization · Absolute Position Encodings · Residual Connection