KnowBias: Mitigating Social Bias in LLMs via Know-Bias Neuron Enhancement
Jinhao Pan, Chahat Raj, Anjishnu Mukherjee, Sina Mansouri, Bowen Wei, Shloka Yada, Ziwei Zhu

TL;DR
KnowBias is a novel framework that enhances neurons encoding bias knowledge in large language models to effectively reduce social biases without retraining or degrading overall performance.
Contribution
It introduces a lightweight, data-efficient method to mitigate social bias by strengthening bias-related neurons, avoiding the drawbacks of traditional suppression-based approaches.
Findings
Achieves state-of-the-art debiasing across multiple benchmarks.
Preserves general capabilities of LLMs after bias mitigation.
Requires only a few yes/no questions without retraining.
Abstract
Large language models (LLMs) exhibit social biases that reinforce harmful stereotypes, limiting their safe deployment. Most existing debiasing methods adopt a suppressive paradigm by modifying parameters, prompts, or neurons associated with biased behavior; however, such approaches are often brittle, weakly generalizable, data-inefficient, and prone to degrading general capability. We propose \textbf{KnowBias}, a lightweight and conceptually distinct framework that mitigates bias by strengthening, rather than suppressing, neurons encoding bias-knowledge. KnowBias identifies neurons encoding bias knowledge using a small set of bias-knowledge questions via attribution-based analysis, and selectively enhances them at inference time. This design enables strong debiasing while preserving general capabilities, generalizes across bias types and demographics, and is highly data efficient,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Ethics and Social Impacts of AI
