Fair-PP: A Synthetic Dataset for Aligning LLM with Personalized Preferences of Social Equity

Qi Zhou; Jie Zhang; Dongxia Wang; Qiang Liu; Tianlin Li; Jin Song Dong; Wenhai Wang; Qing Guo

arXiv:2505.11861·cs.AI·May 20, 2025

Fair-PP: A Synthetic Dataset for Aligning LLM with Personalized Preferences of Social Equity

Qi Zhou, Jie Zhang, Dongxia Wang, Qiang Liu, Tianlin Li, Jin Song Dong, Wenhai Wang, Qing Guo

PDF

Open Access 1 Repo 1 Datasets

TL;DR

This paper introduces Fair-PP, a synthetic dataset based on social survey data, to improve alignment of large language models with personalized social equity preferences, using an automated data generation framework and reweighting methods.

Contribution

The paper presents a novel synthetic dataset and an automated framework for personalized preference data generation, addressing limitations of existing datasets in social equity alignment.

Findings

01

Fair-PP effectively captures diverse social equity preferences.

02

The reweighting method improves LLM alignment with target personas.

03

Empirical results show the proposed approach outperforms baselines.

Abstract

Human preference plays a crucial role in the refinement of large language models (LLMs). However, collecting human preference feedback is costly and most existing datasets neglect the correlation between personalization and preferences. To address this issue, we introduce Fair-PP, a synthetic dataset of personalized preferences targeting social equity, derived from real-world social survey data, which includes 28 social groups, 98 equity topics, and 5 personal preference dimensions. Leveraging GPT-4o-mini, we engage in role-playing based on seven representative persona portrayals guided by existing social survey data, yielding a total of 238,623 preference records. Through Fair-PP, we also contribute (i) An automated framework for generating preference data, along with a more fine-grained dataset of personalized preferences; (ii) analysis of the positioning of the existing mainstream…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tools-only/FairPP
pytorchOfficial

Datasets

tools-o/Fair-PP
dataset· 5 dl
5 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Recommender Systems and Techniques