Large Language Model Bias Mitigation from the Perspective of Knowledge   Editing

Ruizhe Chen; Yichen Li; Zikai Xiao; Zuozhu Liu

arXiv:2405.09341·cs.CL·July 2, 2024

Large Language Model Bias Mitigation from the Perspective of Knowledge Editing

Ruizhe Chen, Yichen Li, Zikai Xiao, Zuozhu Liu

PDF

Open Access

TL;DR

This paper introduces a new benchmark and a fine-grained debiasing method for large language models that improves fairness without compromising knowledge accuracy.

Contribution

It proposes BiasKE for systematic bias assessment and FAST for editable fairness through knowledge calibration, advancing debiasing techniques in LLMs.

Findings

01

FAST outperforms existing debiasing methods

02

Maintains knowledge integrity while reducing bias

03

Provides a comprehensive bias mitigation benchmark

Abstract

Existing debiasing methods inevitably make unreasonable or undesired predictions as they are designated and evaluated to achieve parity across different social groups but leave aside individual facts, resulting in modified existing knowledge. In this paper, we first establish a new bias mitigation benchmark BiasKE leveraging existing and additional constructed datasets, which systematically assesses debiasing performance by complementary metrics on fairness, specificity, and generalization. Meanwhile, we propose a novel debiasing method, Fairness Stamp (FAST), which enables editable fairness through fine-grained calibration on individual biased knowledge. Comprehensive experiments demonstrate that FAST surpasses state-of-the-art baselines with remarkable debiasing performance while not hampering overall model capability for knowledge preservation, highlighting the prospect of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques