Establishing Knowledge Preference in Language Models
Sizhe Zhou, Sha Li, Yu Meng, Yizhu Jiao, Heng Ji, Jiawei Han

TL;DR
This paper introduces a unified framework for understanding and evaluating how language models prioritize different sources of knowledge, proposing a dataset synthesis method to fine-tune models for better adherence to a defined knowledge preference hierarchy.
Contribution
It unifies knowledge preference settings, creates evaluation datasets, and develops a fine-tuning method to improve models' adherence to knowledge hierarchy.
Findings
7B model fine-tuned with our method improves adherence by over 18%
Proposed datasets enable systematic evaluation of knowledge preference
Fine-tuning with synthesized data enhances model compliance with hierarchy
Abstract
Language models are known to encode a great amount of factual knowledge through pretraining. However, such knowledge might be insufficient to cater to user requests, requiring the model to integrate external knowledge sources and adhere to user-provided specifications. When answering questions about ongoing events, the model should use recent news articles to update its response; when asked to provide recommendations, the model should prioritize user specifications over retrieved product reviews; when some facts are edited in the model, the updated facts should override all prior knowledge learned by the model even if they are conflicting. In all of the cases above, the model faces a decision between its own parametric knowledge, (retrieved) contextual knowledge, and user instruction knowledge. In this paper, we (1) unify such settings into the problem of knowledge preference and define…
Peer Reviews
Decision·ICLR 2025 Conference Withdrawn Submission
The paper attempts to offer a unified perspective for knowledge editing and RAG: handling conflicting paramtric knowledge and external knowledge, which is interesting.
1. [soundness] The paper proposed that there is a hierarchy between knowledge preferences. I agree that: for knowledge editing, instruction knowledge has higher priority than parametric knowledge; for RAG, context knowledge has higher priority than parametric knowledge. But I question the hierarchy between instruction knowledge and context knowledge, and this hierarchy finds no natural applications. - line 122 exemplifies instruction knowledge with assumptions and language requirements. The pape
The authors both benchmark and introduce methods for improving performance on tasks requiring knowledge preference resolution. The compiled benchmarks and the data synthesis methods seem well-motivated and would be a valuable contribution to the community. Their data synthesis method yields solid gains in model performance, without requiring human annotation. The work also includes extensive in-depth analysis including 1) benchmarking performance for various types of knowledge conflicts 2) explo
My large concern is my difficulty with holistically understanding the flow and details of the contributions. The writing in some crucial sections was often very dense and hard to follow. While I spent considerable time working through the contributions, I think it is possible I misinterpreted some of the key contributions (in the evaluation datasets that were compiled+augmented, and in the synthetic data creation process). There are *many* components to this work (sourcing benchmarks for various
- **Originality**: To my knowledge, while the issue of knowledge conflicts has been discussed, the proposal of a three-tiered knowledge framework is novel. - **Quality**: The overall quality of the paper is adequate; it identifies a pertinent problem and suggests a relatively appropriate method to address it. - **Clarity**: The paper is clearly structured, making it easy to follow. - **Significance**: Knowledge conflicts in RAG are a real-world issue; addressing and clarifying these conflicts is
- The three-tiered knowledge management system is merely a specific case of instruction-following fine-tuning, and it may not be universally applicable. For instance, if the retrieved articles contain conflicting viewpoints, which one should be prioritized? The assumption that "retrieved content is always correct" is too strong. - The work primarily only uses datasets created by others to construct a fine-tuning dataset and fine-tunes a single model (Mistral 7B). - Most experimental models are s
1. The main idea of this paper is novel, as the prioritization of parametric, contextual, and instruction knowledge is suited to real-world RAG scenarios. The paper presents an innovative framework for establishing a hierarchy of knowledge preference in large language models, addressing a critical issue in LLM behavior when faced with conflicting knowledge sources. 2. The construction of the benchmark is thoroughly explained, providing a reliable dataset for the RAG domain. The proposed data sy
1. While the experiments validate the proposed method, the paper lacks sufficient ablation studies on key components of the model, such as the effects of removing instruction knowledge or contextual knowledge prioritization. These ablations would offer better insights into the importance of each element in the hierarchical framework. 2. Although the benchmark created by the authors integrates a wide range of existing datasets, the rationale behind selecting these specific datasets is not clear
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
