MPF: Aligning and Debiasing Language Models post Deployment via Multi Perspective Fusion

Xin Guan; PeiHsin Lin; Zekun Wu; Ze Wang; Ruibo Zhang; Emre Kazim; Adriano Koshiyama

arXiv:2507.02595·cs.CL·July 4, 2025

MPF: Aligning and Debiasing Language Models post Deployment via Multi Perspective Fusion

Xin Guan, PeiHsin Lin, Zekun Wu, Ze Wang, Ruibo Zhang, Emre Kazim, Adriano Koshiyama

PDF

TL;DR

MPF is a posttraining alignment framework that uses multiperspective generation to mitigate biases in large language models, aligning their outputs with humanlike baseline distributions without extensive finetuning.

Contribution

It introduces a novel multiperspective fusion approach for bias mitigation that is scalable, interpretable, and compatible with deployed LLMs, built on the SAGED pipeline.

Findings

01

Successfully aligns sentiment distributions with counterfactual and HR baselines

02

Reduces calibration error and KL divergence in LLM outputs

03

Generalizes well to unseen questions

Abstract

Multiperspective Fusion (MPF) is a novel posttraining alignment framework for large language models (LLMs) developed in response to the growing need for easy bias mitigation. Built on top of the SAGED pipeline, an automated system for constructing bias benchmarks and extracting interpretable baseline distributions, MPF leverages multiperspective generations to expose and align biases in LLM outputs with nuanced, humanlike baselines. By decomposing baseline, such as sentiment distributions from HR professionals, into interpretable perspective components, MPF guides generation through sampling and balancing of responses, weighted by the probabilities obtained in the decomposition. Empirically, we demonstrate its ability to align LLM sentiment distributions with both counterfactual baselines (absolute equality) and the HR baseline (biased for Top Univeristy), resulting in small KL…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.