Beyond Single-Value Metrics: Evaluating and Enhancing LLM Unlearning   with Cognitive Diagnosis

Yicheng Lang; Kehan Guo; Yue Huang; Yujun Zhou; Haomin Zhuang; Tianyu; Yang; Yao Su; Xiangliang Zhang

arXiv:2502.13996·cs.LG·February 21, 2025

Beyond Single-Value Metrics: Evaluating and Enhancing LLM Unlearning with Cognitive Diagnosis

Yicheng Lang, Kehan Guo, Yue Huang, Yujun Zhou, Haomin Zhuang, Tianyu, Yang, Yao Su, Xiangliang Zhang

PDF

Open Access

TL;DR

This paper introduces UNCD, a comprehensive framework for evaluating and improving LLM unlearning by using cognitive diagnosis modeling to assess and target the removal of harmful knowledge more effectively.

Contribution

The paper presents UNCD, a novel cognitive diagnosis-based evaluation framework and benchmark for fine-grained assessment and enhancement of LLM unlearning methods.

Findings

01

UNCD provides more nuanced evaluation of unlearning effectiveness.

02

UNCD improves the removal of harmful capabilities in LLMs.

03

Extensive experiments validate UNCD's effectiveness across multiple methods and models.

Abstract

Due to the widespread use of LLMs and the rising critical ethical and safety concerns, LLM unlearning methods have been developed to remove harmful knowledge and undesirable capabilities. In this context, evaluations are mostly based on single-value metrics such as QA accuracy. However, these metrics often fail to capture the nuanced retention of harmful knowledge components, making it difficult to assess the true effectiveness of unlearning. To address this issue, we propose UNCD (UNlearning evaluation via Cognitive Diagnosis), a novel framework that leverages Cognitive Diagnosis Modeling for fine-grained evaluation of LLM unlearning. Our dedicated benchmark, UNCD-Cyber, provides a detailed assessment of the removal of dangerous capabilities. Moreover, we introduce UNCD-Agent, which refines unlearning by diagnosing knowledge remnants and generating targeted unlearning data. Extensive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText Readability and Simplification

MethodsBalanced Selection