Enhancing Semantic Consistency of Large Language Models through Model   Editing: An Interpretability-Oriented Approach

Jingyuan Yang; Dapeng Chen; Yajing Sun; Rongjun Li; Zhiyong Feng; Wei; Peng

arXiv:2501.11041·cs.CL·January 22, 2025

Enhancing Semantic Consistency of Large Language Models through Model Editing: An Interpretability-Oriented Approach

Jingyuan Yang, Dapeng Chen, Yajing Sun, Rongjun Li, Zhiyong Feng, Wei, Peng

PDF

Open Access 1 Video

TL;DR

This paper proposes an interpretable model editing approach that enhances the semantic consistency of large language models by injecting biases into key attention heads, achieving significant improvements efficiently.

Contribution

The paper introduces a cost-effective, interpretability-oriented model editing method targeting attention heads to improve LLM semantic consistency without extensive fine-tuning.

Findings

01

Significant improvement in semantic consistency across datasets.

02

Method generalizes well to tasks beyond primary training.

03

Efficient alternative to traditional fine-tuning methods.

Abstract

A Large Language Model (LLM) tends to generate inconsistent and sometimes contradictory outputs when presented with a prompt that has equivalent semantics but is expressed differently from the original prompt. To achieve semantic consistency of an LLM, one of the key approaches is to finetune the model with prompt-output pairs with semantically equivalent meanings. Despite its effectiveness, a data-driven finetuning method incurs substantial computation costs in data preparation and model optimization. In this regime, an LLM is treated as a ``black box'', restricting our ability to gain deeper insights into its internal mechanism. In this paper, we are motivated to enhance the semantic consistency of LLMs through a more interpretable method (i.e., model editing) to this end. We first identify the model components (i.e., attention heads) that have a key impact on the semantic consistency…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Enhancing Semantic Consistency of Large Language Models through Model Editing: An Interpretability-Oriented Approach· underline

Taxonomy

TopicsNatural Language Processing Techniques · Semantic Web and Ontologies

MethodsSoftmax · Attention Is All You Need