Conceptor-Aided Debiasing of Large Language Models

Li S. Yifei; Lyle Ungar; Jo\~ao Sedoc

arXiv:2211.11087·cs.CL·November 1, 2023·1 cites

Conceptor-Aided Debiasing of Large Language Models

Li S. Yifei, Lyle Ungar, Jo\~ao Sedoc

PDF

Open Access

TL;DR

This paper introduces conceptor-based methods for debiasing large language models, achieving state-of-the-art bias reduction while preserving model accuracy with post-processing, and exploring an architecture that incorporates bias mitigation during training.

Contribution

It proposes two novel conceptor-based debiasing techniques for LLMs, including a post-processing method and a new architecture, demonstrating improved bias mitigation and insights on bias subspace construction.

Findings

01

Conceptor post-processing achieves state-of-the-art debiasing results.

02

The methods effectively mitigate intersectional bias.

03

CI-BERT reduces bias but at some cost to accuracy.

Abstract

Pre-trained large language models (LLMs) reflect the inherent social biases of their training corpus. Many methods have been proposed to mitigate this issue, but they often fail to debias or they sacrifice model accuracy. We use conceptors--a soft projection method--to identify and remove the bias subspace in LLMs such as BERT and GPT. We propose two methods of applying conceptors (1) bias subspace projection by post-processing by the conceptor NOT operation; and (2) a new architecture, conceptor-intervened BERT (CI-BERT), which explicitly incorporates the conceptor projection into all layers during training. We find that conceptor post-processing achieves state-of-the-art (SoTA) debiasing results while maintaining LLMs' performance on the GLUE benchmark. Further, it is robust in various scenarios and can mitigate intersectional bias efficiently by its AND operation on the existing bias…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Computational and Text Analysis Methods

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · fail · Cosine Annealing · Linear Warmup With Cosine Annealing · Discriminative Fine-Tuning · Weight Decay · Adam · Linear Layer · Dense Connections