Fine-Tuned LLMs Know They Don't Know: A Parameter-Efficient Approach to Recovering Honesty

Zeyu Shi; Ziming Wang; Tianyu Chen; Shiqi Gao; Haoyi Zhou; Qingyun Sun; Jianxin Li

arXiv:2511.12991·cs.CL·November 18, 2025

Fine-Tuned LLMs Know They Don't Know: A Parameter-Efficient Approach to Recovering Honesty

Zeyu Shi, Ziming Wang, Tianyu Chen, Shiqi Gao, Haoyi Zhou, Qingyun Sun, Jianxin Li

PDF

Open Access 1 Video

TL;DR

This paper introduces HCNR, a parameter-efficient method to restore honesty in fine-tuned LLMs by surgically repairing key neurons, significantly improving honesty recovery with less data and faster speed.

Contribution

The paper presents HCNR, a novel neuron-level restoration technique that effectively recovers honesty in fine-tuned LLMs with minimal data and computational resources.

Findings

01

Restores 33.25% of honesty loss in LLMs

02

Achieves 2.23x speedup over baseline methods

03

Uses over 10x less data for fine-tuning

Abstract

The honesty of Large Language Models (LLMs) is increasingly important for safe deployment in high-stakes domains. However, this crucial trait is severely undermined by supervised fine-tuning (SFT), a common technique for model specialization. Existing recovery methods rely on data-intensive global parameter adjustments, implicitly assuming that SFT deeply corrupts the models' ability to recognize their knowledge boundaries. However, we observe that fine-tuned LLMs still preserve this ability; what is damaged is their capacity to faithfully express that awareness. Building on this, we propose Honesty-Critical Neurons Restoration (HCNR) to surgically repair this suppressed capacity. HCNR identifies and restores key expression-governing neurons to their pre-trained state while harmonizing them with task-oriented neurons via Hessian-guided compensation. Experiments on four QA tasks and five…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Fine-Tuned LLMs Know They Don’t Know: A Parameter-Efficient Approach to Recovering Honesty· underline

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Artificial Intelligence in Healthcare and Education · Topic Modeling