Compositional Bias Control in Large Language Models: Preference Learning Fails, Supervision Succeeds

Atij Mahesh

arXiv:2510.22084·cs.CL·October 28, 2025

Compositional Bias Control in Large Language Models: Preference Learning Fails, Supervision Succeeds

Atij Mahesh

PDF

TL;DR

This paper compares various bias mitigation techniques in large language models, finding that explicit supervision methods outperform preference learning in enforcing compositional constraints and maintaining naturalness.

Contribution

It provides a comprehensive analysis of six control methods, highlighting the limitations of preference learning and emphasizing the effectiveness of supervised fine-tuning for bias control.

Findings

01

Supervised fine-tuning achieves near-perfect constraint compliance.

02

Preference learning fails to enforce compositional constraints.

03

Explicit supervision maintains fluency and diversity.

Abstract

Large Language Models (LLMs) still produce gender-stereotyped language even in occupation-neutral contexts that reflect deep societal biases (Rudinger et al., 2018). To address this, prior work has proposed prompting, constrained decoding (Dathathri et al., 2020; Zhou et al., 2024), post-processing, and fine-tuning-based alignment (Rafailov et al., 2023; Ravfogel et al., 2022). However, the comparative efficacy and learning dynamics remain little understood. We report a comparative analysis of six control techniques for bias mitigation: prompt-only, generate-and-filter, DFA-based Ctrl-G decoding, Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Iterative Nullspace Projection (INLP). We evaluate each method on a compositional constraint task. This task requires generating sentences that contain at least one agentic and one communal descriptor for each of the twenty…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.