Separate the Wheat from the Chaff: Model Deficiency Unlearning via   Parameter-Efficient Module Operation

Xinshuo Hu; Dongfang Li; Baotian Hu; Zihao Zheng; Zhenyu Liu; Min; Zhang

arXiv:2308.08090·cs.CL·January 19, 2024·1 cites

Separate the Wheat from the Chaff: Model Deficiency Unlearning via Parameter-Efficient Module Operation

Xinshuo Hu, Dongfang Li, Baotian Hu, Zihao Zheng, Zhenyu Liu, Min, Zhang

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a novel parameter-efficient module operation, Ext-Sub, to unlearn deficiencies like untruthfulness and toxicity in large language models while maintaining their core capabilities.

Contribution

It proposes a new method to selectively remove deficiency capabilities from PEMs in LLMs, improving truthfulness and detoxification without degrading overall performance.

Findings

01

Significant improvement in truthfulness and detoxification of LLMs.

02

Preservation of language modeling and reasoning abilities.

03

Effective deficiency unlearning with minimal impact on core skills.

Abstract

Large language models (LLMs) have been widely used in various applications but are known to suffer from issues related to untruthfulness and toxicity. While parameter-efficient modules (PEMs) have demonstrated their effectiveness in equipping models with new skills, leveraging PEMs for deficiency unlearning remains underexplored. In this work, we propose a PEMs operation approach, namely Extraction-before-Subtraction (Ext-Sub), to enhance the truthfulness and detoxification of LLMs through the integration of ``expert'' PEM and ``anti-expert'' PEM. Remarkably, even anti-expert PEM possess valuable capabilities due to their proficiency in generating fabricated content, which necessitates language modeling and logical narrative competence. Rather than merely negating the parameters, our approach involves extracting and eliminating solely the deficiency capability within anti-expert PEM…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hitsz-tmg/ext-sub
pytorchOfficial

Videos

Separate the Wheat from the Chaff: Model Deficiency Unlearning via Parameter-Efficient Module Operation· underline

Taxonomy

TopicsSoftware Engineering Research · Machine Learning and Data Classification · Topic Modeling