Self-Debias: Self-correcting for Debiasing Large Language Models

Xuan Feng; Shuai Zhao; Luwei Xiao; Tianlong Gu; Bo An

arXiv:2604.08243·cs.CL·May 12, 2026

Self-Debias: Self-correcting for Debiasing Large Language Models

Xuan Feng, Shuai Zhao, Luwei Xiao, Tianlong Gu, Bo An

PDF

TL;DR

Self-Debias is a novel framework that enables large language models to self-correct social biases during reasoning, using a resource redistribution approach and online self-improvement with minimal supervision.

Contribution

It introduces a progressive, self-correcting debiasing method that reallocates probability mass and employs consistency filtering, reducing reliance on external interventions.

Findings

01

Achieves superior debiasing with only 20k annotated samples.

02

Preserves reasoning capabilities while reducing biases.

03

Enables autonomous self-correction during reasoning processes.

Abstract

Although Large Language Models (LLMs) demonstrate remarkable reasoning capabilities, inherent social biases often cascade throughout the Chain-of-Thought (CoT) process, leading to continuous "Bias Propagation". Existing debiasing methods primarily focus on static constraints or external interventions, failing to identify and interrupt this propagation once triggered. To address this limitation, we introduce Self-Debias, a progressive framework designed to instill intrinsic self-correction capabilities. Specifically, we reformulate the debiasing process as a strategic resource redistribution problem, treating the model's output probability mass as a limited resource to be reallocated from biased heuristics to unbiased reasoning paths. Unlike standard preference optimization which applies broad penalties, Self-Debias employs a fine-grained trajectory-level objective subject to dynamic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.