Towards Transfer Unlearning: Empirical Evidence of Cross-Domain Bias   Mitigation

Huimin Lu; Masaru Isonuma; Junichiro Mori; Ichiro Sakata

arXiv:2407.16951·cs.CL·July 25, 2024

Towards Transfer Unlearning: Empirical Evidence of Cross-Domain Bias Mitigation

Huimin Lu, Masaru Isonuma, Junichiro Mori, Ichiro Sakata

PDF

TL;DR

This paper introduces a novel unlearning-based debiasing method for large language models that reduces biases and toxicity, with evidence that debiasing one bias type can transfer to mitigate others across domains.

Contribution

It proposes a mask language modeling unlearning technique to selectively forget biased content and demonstrates cross-domain bias mitigation effects.

Findings

01

Effective bias reduction while preserving language modeling quality

02

Unlearning one bias can help mitigate other biases across domains

03

Potential for improved debiasing strategies through transfer unlearning

Abstract

Large language models (LLMs) often inherit biases from vast amounts of training corpora. Traditional debiasing methods, while effective to some extent, do not completely eliminate memorized biases and toxicity in LLMs. In this paper, we study an unlearning-based approach to debiasing in LLMs by performing gradient ascent on hate speech against minority groups, i.e., minimizing the likelihood of biased or toxic content. Specifically, we propose a mask language modeling unlearning technique, which unlearns the harmful part of the text. This method enables LLMs to selectively forget and disassociate from biased and harmful content. Experimental results demonstrate the effectiveness of our approach in diminishing bias while maintaining the language modeling abilities. Surprisingly, the results also unveil an unexpected potential for cross-domain transfer unlearning: debiasing in one bias…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.