Multi-Reward GRPO Fine-Tuning for De-biasing Large Language Models: A Study Based on Chinese-Context Discrimination Data

Deng Yixuan; Ji Xiaoqiang

arXiv:2511.06023·cs.CL·November 11, 2025

Multi-Reward GRPO Fine-Tuning for De-biasing Large Language Models: A Study Based on Chinese-Context Discrimination Data

Deng Yixuan, Ji Xiaoqiang

PDF

Open Access

TL;DR

This paper introduces a multi-reward fine-tuning framework called GRPO to reduce biases in large language models, using Chinese-context discrimination data to improve ethical alignment without sacrificing language quality.

Contribution

It presents a novel multi-reward optimization approach for de-biasing LLMs, specifically addressing culturally specific biases through synthetic datasets and multi-dimensional reward signals.

Findings

01

Significant bias reduction in LLM outputs.

02

Improved ethical alignment without losing fluency.

03

Effective use of multi-reward signals for bias mitigation.

Abstract

Large Language Models (LLMs) often exhibit implicit biases and discriminatory tendencies that reflect underlying social stereotypes. While recent alignment techniques such as RLHF and DPO have mitigated some of these issues, they remain limited in addressing culturally specific and multi-dimensional forms of discrimination. This paper proposes a Multi-Reward Group Relative Policy Optimization (GRPO) framework to fine-tune LLMs toward ethical and bias-free behavior. Our approach constructs a synthetic English-language dataset derived from Chinese-context discrimination categories, including regional, ethnic, and occupational biases. Each instance is paired with both neutral and biased responses to train a reward model based on DeBERTa-v3, which provides multi-dimensional reward signals capturing fairness, neutrality, and linguistic quality. The trained reward model then guides GRPO…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational and Text Analysis Methods · Artificial Intelligence in Healthcare and Education · Ethics and Social Impacts of AI