Mitigating Social Biases in Language Models through Unlearning
Omkar Dige, Diljot Singh, Tsz Fung Yau, Qixuan Zhang, Borna, Bolandraftar, Xiaodan Zhu, Faiza Khan Khattak

TL;DR
This paper introduces two unlearning techniques to reduce social biases in language models, demonstrating that the negation via Task Vector method is more effective and preserves model performance better than other approaches.
Contribution
It proposes and empirically evaluates two novel unlearning methods for bias mitigation in large language models, with the negation via Task Vector method showing superior results.
Findings
Negation via Task Vector reduces bias scores by 11.8% on LLaMA-27B.
Negation via Task Vector outperforms PCGU in debiasing effectiveness.
The methods maintain low deterioration in model perplexity.
Abstract
Mitigating bias in language models (LMs) has become a critical problem due to the widespread deployment of LMs. Numerous approaches revolve around data pre-processing and fine-tuning of language models, tasks that can be both time-consuming and computationally demanding. Consequently, there is a growing interest in machine unlearning techniques given their capacity to induce the forgetting of undesired behaviors of the existing pre-trained or fine-tuned models with lower computational cost. In this work, we explore two unlearning methods, (1) Partitioned Contrastive Gradient Unlearning (PCGU) applied on decoder models and (2) Negation via Task Vector, to reduce social biases in state-of-the-art and open-source LMs such as LLaMA-2 and OPT. We also implement distributed PCGU for large models. It is empirically shown, through quantitative and qualitative analyses, that negation via Task…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning · Topic Modeling · Natural Language Processing Techniques
MethodsOPT
