SEAT: Sparse Entity-Aware Tuning for Knowledge Adaptation while Preserving Epistemic Abstention
William F. Shen, Xinchi Qiu, Nicola Cancedda, Nicholas D. Lane

TL;DR
SEAT is a novel fine-tuning method that enhances knowledge adaptation in LLMs while preserving the model's ability to abstain from answering unknown queries, crucial for safety in high-stakes applications.
Contribution
SEAT introduces a sparse tuning and entity-perturbed KL regularization approach that maintains epistemic abstention without requiring alignment data or post-hoc adjustments.
Findings
SEAT improves abstention on unknown queries by 18%-101%.
SEAT retains near-perfect knowledge acquisition.
SEAT produces coherent, context-aware abstentions.
Abstract
Adapting LLMs with new knowledge is increasingly important, but standard fine-tuning often erodes aligned epistemic abstention: the ability to acknowledge when the model does not know. This failure mode is especially concerning in high-stakes settings, where abstention is a critical safeguard against hallucination. We present SEAT, a preventive fine-tuning method that preserves epistemic abstention while maintaining strong knowledge acquisition. SEAT combines sparse tuning, which constrains global activation drift, with entity-perturbed KL regularization, which sharpens local epistemic boundaries and prevents spillover to neighboring knowledge. Crucially, SEAT requires no alignment data, explicit boundary probing, or post-hoc re-alignment, making it attractive for lightweight and privacy-sensitive adaptation. Across models and datasets, SEAT improves human-evaluated abstention on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
