Enhancing Semantic Consistency of Large Language Models through Model Editing: An Interpretability-Oriented Approach
Jingyuan Yang, Dapeng Chen, Yajing Sun, Rongjun Li, Zhiyong Feng, Wei, Peng

TL;DR
This paper proposes an interpretable model editing approach that enhances the semantic consistency of large language models by injecting biases into key attention heads, achieving significant improvements efficiently.
Contribution
The paper introduces a cost-effective, interpretability-oriented model editing method targeting attention heads to improve LLM semantic consistency without extensive fine-tuning.
Findings
Significant improvement in semantic consistency across datasets.
Method generalizes well to tasks beyond primary training.
Efficient alternative to traditional fine-tuning methods.
Abstract
A Large Language Model (LLM) tends to generate inconsistent and sometimes contradictory outputs when presented with a prompt that has equivalent semantics but is expressed differently from the original prompt. To achieve semantic consistency of an LLM, one of the key approaches is to finetune the model with prompt-output pairs with semantically equivalent meanings. Despite its effectiveness, a data-driven finetuning method incurs substantial computation costs in data preparation and model optimization. In this regime, an LLM is treated as a ``black box'', restricting our ability to gain deeper insights into its internal mechanism. In this paper, we are motivated to enhance the semantic consistency of LLMs through a more interpretable method (i.e., model editing) to this end. We first identify the model components (i.e., attention heads) that have a key impact on the semantic consistency…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Semantic Web and Ontologies
MethodsSoftmax · Attention Is All You Need
