KaFT: Knowledge-aware Fine-tuning for Boosting LLMs' Domain-specific Question-Answering Performance

Qihuang Zhong; Liang Ding; Xiantao Cai; Juhua Liu; Bo Du; Dacheng Tao

arXiv:2505.15480·cs.CL·May 29, 2025

KaFT: Knowledge-aware Fine-tuning for Boosting LLMs' Domain-specific Question-Answering Performance

Qihuang Zhong, Liang Ding, Xiantao Cai, Juhua Liu, Bo Du, Dacheng Tao

PDF

Open Access

TL;DR

This paper introduces KaFT, a novel fine-tuning method that dynamically adjusts training sample weights based on knowledge conflict levels to enhance large language models' domain-specific question-answering performance.

Contribution

The paper proposes a conflict-aware fine-tuning approach that improves LLMs by selectively weighting training data according to knowledge conflict, addressing limitations of traditional supervised fine-tuning.

Findings

01

KaFT significantly improves QA performance across four LLMs.

02

Training data with high conflicts can harm model performance if not properly managed.

03

Appropriate use of conflict data enhances model generalization and reduces hallucinations.

Abstract

Supervised fine-tuning (SFT) is a common approach to improve the domain-specific question-answering (QA) performance of large language models (LLMs). However, recent literature reveals that due to the conflicts between LLMs' internal knowledge and the context knowledge of training data, vanilla SFT using the full QA training set is usually suboptimal. In this paper, we first design a query diversification strategy for robust conflict detection and then conduct a series of experiments to analyze the impact of knowledge conflict. We find that 1) training samples with varied conflicts contribute differently, where SFT on the data with large conflicts leads to catastrophic performance drops; 2) compared to directly filtering out the conflict data, appropriately applying the conflict data would be more beneficial. Motivated by this, we propose a simple-yet-effective Knowledge-aware…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Semantic Web and Ontologies

MethodsShrink and Fine-Tune · Sparse Evolutionary Training