Mitigating Selection Bias in Large Language Models via Permutation-Aware GRPO

Jinquan Zheng; Jia Yuan; Jiacheng Yao; Chenyang Gu; Pujun Zheng; Guoxiu He

arXiv:2603.21016·cs.CL·May 1, 2026

Mitigating Selection Bias in Large Language Models via Permutation-Aware GRPO

Jinquan Zheng, Jia Yuan, Jiacheng Yao, Chenyang Gu, Pujun Zheng, Guoxiu He

PDF

1 Repo

TL;DR

This paper introduces PA-GRPO, a permutation-aware training method for large language models that reduces selection bias by enforcing permutation consistency, leading to improved fairness and performance.

Contribution

The paper proposes a novel permutation-aware training approach, PA-GRPO, that mitigates selection bias in LLMs by enforcing permutation consistency during training.

Findings

01

PA-GRPO outperforms strong baselines on seven benchmarks.

02

It substantially reduces selection bias without sacrificing overall performance.

03

Experimental results validate the effectiveness of permutation-aware optimization.

Abstract

Large language models (LLMs) used for multiple-choice and pairwise evaluation tasks often exhibit selection bias due to non-semantic factors like option positions and label symbols. Existing inference-time debiasing is costly and may harm reasoning, while pointwise training ignores that the same question should yield consistent answers across permutations. To address this issue, we propose Permutation-Aware Group Relative Policy Optimization (PA-GRPO), which mitigates selection bias by enforcing permutation-consistent semantic reasoning. PA-GRPO constructs a permutation group for each instance by generating multiple candidate permutations, and optimizes the model using two complementary mechanisms: (1) cross-permutation advantage, which computes advantages relative to the mean reward over all permutations of the same instance, and (2) consistency-aware reward, which encourages the model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ECNU-Text-Computing/PA-GRPO
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.