BiasFilter: An Inference-Time Debiasing Framework for Large Language Models

Xiaoqing Cheng; Ruizhe Chen; Hongying Zan; Yuxiang Jia; and Min Peng

arXiv:2505.23829·cs.CL·June 2, 2025

BiasFilter: An Inference-Time Debiasing Framework for Large Language Models

Xiaoqing Cheng, Ruizhe Chen, Hongying Zan, Yuxiang Jia, and Min Peng

PDF

Open Access

TL;DR

BiasFilter is a novel inference-time framework that reduces social bias in large language models by filtering outputs in real time, without retraining or modifying the original models, thus improving fairness efficiently.

Contribution

It introduces BiasFilter, a model-agnostic, inference-time debiasing method that filters LLM outputs based on a learned fairness reward, scalable to large models and open-ended tasks.

Findings

01

Effectively reduces social bias across various LLMs

02

Preserves overall generation quality

03

Operates efficiently without retraining or model modification

Abstract

Mitigating social bias in large language models (LLMs) has become an increasingly important research objective. However, existing debiasing methods often incur high human and computational costs, exhibit limited effectiveness, and struggle to scale to larger models and open-ended generation tasks. To address these limitations, this paper proposes BiasFilter, a model-agnostic, inference-time debiasing framework that integrates seamlessly with both open-source and API-based LLMs. Instead of relying on retraining with balanced data or modifying model parameters, BiasFilter enforces fairness by filtering generation outputs in real time. Specifically, it periodically evaluates intermediate outputs every few tokens, maintains an active set of candidate continuations, and incrementally completes generation by discarding low-reward segments based on a fairness reward signal. To support this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsSparse Evolutionary Training