KnowBias: Mitigating Social Bias in LLMs via Know-Bias Neuron Enhancement

Jinhao Pan; Chahat Raj; Anjishnu Mukherjee; Sina Mansouri; Bowen Wei; Shloka Yada; Ziwei Zhu

arXiv:2601.21864·cs.AI·January 30, 2026

KnowBias: Mitigating Social Bias in LLMs via Know-Bias Neuron Enhancement

Jinhao Pan, Chahat Raj, Anjishnu Mukherjee, Sina Mansouri, Bowen Wei, Shloka Yada, Ziwei Zhu

PDF

Open Access

TL;DR

KnowBias is a novel framework that enhances neurons encoding bias knowledge in large language models to effectively reduce social biases without retraining or degrading overall performance.

Contribution

It introduces a lightweight, data-efficient method to mitigate social bias by strengthening bias-related neurons, avoiding the drawbacks of traditional suppression-based approaches.

Findings

01

Achieves state-of-the-art debiasing across multiple benchmarks.

02

Preserves general capabilities of LLMs after bias mitigation.

03

Requires only a few yes/no questions without retraining.

Abstract

Large language models (LLMs) exhibit social biases that reinforce harmful stereotypes, limiting their safe deployment. Most existing debiasing methods adopt a suppressive paradigm by modifying parameters, prompts, or neurons associated with biased behavior; however, such approaches are often brittle, weakly generalizable, data-inefficient, and prone to degrading general capability. We propose \textbf{KnowBias}, a lightweight and conceptually distinct framework that mitigates bias by strengthening, rather than suppressing, neurons encoding bias-knowledge. KnowBias identifies neurons encoding bias knowledge using a small set of bias-knowledge questions via attribution-based analysis, and selectively enhances them at inference time. This design enables strong debiasing while preserving general capabilities, generalizes across bias types and demographics, and is highly data efficient,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Ethics and Social Impacts of AI