Conceptor-Aided Debiasing of Large Language Models
Li S. Yifei, Lyle Ungar, Jo\~ao Sedoc

TL;DR
This paper introduces conceptor-based methods for debiasing large language models, achieving state-of-the-art bias reduction while preserving model accuracy with post-processing, and exploring an architecture that incorporates bias mitigation during training.
Contribution
It proposes two novel conceptor-based debiasing techniques for LLMs, including a post-processing method and a new architecture, demonstrating improved bias mitigation and insights on bias subspace construction.
Findings
Conceptor post-processing achieves state-of-the-art debiasing results.
The methods effectively mitigate intersectional bias.
CI-BERT reduces bias but at some cost to accuracy.
Abstract
Pre-trained large language models (LLMs) reflect the inherent social biases of their training corpus. Many methods have been proposed to mitigate this issue, but they often fail to debias or they sacrifice model accuracy. We use conceptors--a soft projection method--to identify and remove the bias subspace in LLMs such as BERT and GPT. We propose two methods of applying conceptors (1) bias subspace projection by post-processing by the conceptor NOT operation; and (2) a new architecture, conceptor-intervened BERT (CI-BERT), which explicitly incorporates the conceptor projection into all layers during training. We find that conceptor post-processing achieves state-of-the-art (SoTA) debiasing results while maintaining LLMs' performance on the GLUE benchmark. Further, it is robust in various scenarios and can mitigate intersectional bias efficiently by its AND operation on the existing bias…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Computational and Text Analysis Methods
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · fail · Cosine Annealing · Linear Warmup With Cosine Annealing · Discriminative Fine-Tuning · Weight Decay · Adam · Linear Layer · Dense Connections
