Beyond Superficial Unlearning: Sharpness-Aware Robust Erasure of Hallucinations in Multimodal LLMs

Xianya Fang; Feiyang Ren; Xiang Chen; Yu Tian; Zhen Bi; Haiyang Yu; Sheng-Jun Huang

arXiv:2601.16527·cs.LG·May 19, 2026

Beyond Superficial Unlearning: Sharpness-Aware Robust Erasure of Hallucinations in Multimodal LLMs

Xianya Fang, Feiyang Ren, Xiang Chen, Yu Tian, Zhen Bi, Haiyang Yu, Sheng-Jun Huang

PDF

TL;DR

This paper introduces SARE, a novel method for robustly erasing hallucinations in multimodal LLMs by stabilizing the model's loss landscape to prevent hallucination resurgence after unlearning.

Contribution

SARE formulates unlearning as a min-max optimization with Targeted-SAM to ensure geometric stability and persistent hallucination suppression in multimodal LLMs.

Findings

01

SARE outperforms existing baselines in erasure effectiveness.

02

SARE maintains hallucination suppression against relearning and parameter updates.

03

SARE preserves the overall generation quality of the model.

Abstract

Multimodal LLMs are powerful but prone to object hallucinations, which describe non-existent entities and harm reliability. While recent unlearning methods attempt to mitigate this, we identify a critical flaw: structural fragility. We empirically demonstrate that standard erasure achieves only superficial suppression, trapping the model in sharp minima where hallucinations catastrophically resurge after lightweight relearning. To ensure geometric stability, we propose SARE, which casts unlearning as a targeted min-max optimization problem and uses a Targeted-SAM mechanism to explicitly flatten the loss landscape around hallucinated concepts. By suppressing hallucinations under simulated worst-case parameter perturbations, our framework ensures robust removal stable against weight shifts. Extensive experiments demonstrate that SARE significantly outperforms baselines in erasure efficacy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Physical Unclonable Functions (PUFs) and Hardware Security · Generative Adversarial Networks and Image Synthesis